首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The chicken genome is sequenced and this, together with microarray and other functional genomics technologies, makes post-genomic research possible in the chicken. At this time, however, such research is hindered by a lack of genomic structural and functional annotations. Bio-ontologies have been developed for different annotation requirements, as well as to facilitate data sharing and computational analysis, but these are not yet optimally utilized in the chicken. Here we discuss genomic annotation and bio-ontologies. We focus specifically on the Gene Ontology (GO), chicken GO annotations and how these can facilitate functional genomics in the chicken. The GO is the most developed and widely used bio-ontology. It is the de facto standard for functional annotation. Despite its critical importance in analyzing microarray and other functional genomics data, relatively few chicken gene products have any GO annotation. When these are available, the average quality of chicken gene products annotations (defined using evidence code weight and annotation depth) is much less than in mouse. Moreover, tools allowing chicken researchers to easily and rapidly use the GO are either lacking or hard to use. To address all of these problems we developed ChickGO and AgBase. Chicken GO annotations are provided by complementary work at MSU-AgBase and EBI-GOA. The GO tools pipeline at AgBase uses GO to derive functional and biological significance from microarray and other functional genomics data. Not only will improved genomic annotation and tools to use these annotations benefit the chicken research community but they will also facilitate research in other avian species and comparative genomics.  相似文献   

2.
Sequence similarity is probably the most widely used tool to infer functional linkage between proteins. The fully sequenced, much researched, genome of Saccharomyces cerevisiae gives us on opportunity to compare and statistically quantify computational methods based on sequence similarity, which aim to detect such linkage. In addition, the amount of data regarding Saccharomyces Cerevisiae genes and proteins, which is not directly based on sequence is rapidly increasing. Consequently, it allows investigation of the connections and correlation between classification based on these types of data and that based solely on sequence similarity. In this work we start with a simple clustering algorithm to cluster genes based on the BLAST E-score of their similarity. We analyze how well one can infer function from these clusters and for how many of the genes that are currently unknown one can suggest a prediction. Given these parameters, we show that even a simple algorithm achieves better results than simply considering the BLAST output of matching genes. In the second part of the paper, we show that there is a highly significant correlation (p-value < 10(-4) for the vast majority of the experiments) between the aforementioned clusters and other types of classifications. Namely, we show that a pair of genes being clustered together is correlated with these genes having similar expression patterns in DNA array experiments and with the encoded proteins being involved in protein-protein interactions. Although this correlation is highly significant, it is, of course, not strong enough to be, by itself, a tool for predicting co-regulation of genes or interaction of proteins. We discuss possible explanations for this correlation. Furthermore, the statistical evaluation of these results should be considered when developing tools that are aimed at making such predictions.  相似文献   

3.
Building structural models of entire cells has been a long-standing cross-discipline challenge for the research community, as it requires an unprecedented level of integration between multiple sources of biological data and enhanced methods for computational modeling and visualization. Here, we present the first 3D structural models of an entire Mycoplasma genitalium (MG) cell, built using the CellPACK suite of computational modeling tools. Our model recapitulates the data described in recent whole-cell system biology simulations and provides a structural representation for all MG proteins, DNA and RNA molecules, obtained by combining experimental and homology-modeled structures and lattice-based models of the genome. We establish a framework for gathering, curating and evaluating these structures, exposing current weaknesses of modeling methods and the boundaries of MG structural knowledge, and visualization methods to explore functional characteristics of the genome and proteome. We compare two approaches for data gathering, a manually-curated workflow and an automated workflow that uses homologous structures, both of which are appropriate for the analysis of mesoscale properties such as crowding and volume occupancy. Analysis of model quality provides estimates of the regularization that will be required when these models are used as starting points for atomic molecular dynamics simulations.  相似文献   

4.
Genome sequencing projects has led to an explosion of large amount of gene products in which many are of hypothetical proteins with unknown function. Analyzing and annotating the functions of hypothetical proteins is important in Staphylococcus aureus which is a pathogenic bacterium that cause multiple types of diseases by infecting various sites in humans and animals. In this study, ten hypothetical proteins of Staphylococcus aureus were retrieved from NCBI and analyzed for their structural and functional characteristics by using various bioinformatics tools and databases. The analysis revealed that some of them possessed functionally important domains and families and protein-protein interacting partners which were ABC transporter ATP-binding protein, Multiple Antibiotic Resistance (MAR) family, export proteins, Helix-Turn-helix domains, arsenate reductase, elongation factor, ribosomal proteins, Cysteine protease precursor, Type-I restriction endonuclease enzyme and plasmid recombination enzyme which might have the same functions in hypothetical proteins. The structural prediction of those proteins and binding sites prediction have been done which would be useful in docking studies for aiding in the drug discovery.  相似文献   

5.
Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation.  相似文献   

6.
Liu ZP  Wu LY  Wang Y  Zhang XS  Chen L 《Amino acids》2008,35(3):627-650
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.  相似文献   

7.

Background

Mimivirus isolated from A. polyphaga is the largest virus discovered so far. It is unique among all the viruses in having genes related to translation, DNA repair and replication which bear close homology to eukaryotic genes. Nevertheless, only a small fraction of the proteins (33%) encoded in this genome has been assigned a function. Furthermore, a large fraction of the unassigned protein sequences bear no sequence similarity to proteins from other genomes. These sequences are referred to as ORFans. Because of their lack of sequence similarity to other proteins, they can not be assigned putative functions using standard sequence comparison methods. As part of our genome-wide computational efforts aimed at characterizing Mimivirus ORFans, we have applied fold-recognition methods to predict the structure of these ORFans and further functions were derived based on conservation of functionally important residues in sequence-template alignments.

Results

Using fold recognition, we have identified highly confident computational 3D structural assignments for 21 Mimivirus ORFans. In addition, highly confident functional predictions for 6 of these ORFans were derived by analyzing the conservation of functional motifs between the predicted structures and proteins of known function. This analysis allowed us to classify these 6 previously unannotated ORFans into their specific protein families: carboxylesterase/thioesterase, metal-dependent deacetylase, P-loop kinases, 3-methyladenine DNA glycosylase, BTB domain and eukaryotic translation initiation factor eIF4E.

Conclusion

Using stringent fold recognition criteria we have assigned three-dimensional structures for 21 of the ORFans encoded in the Mimivirus genome. Further, based on the 3D models and an analysis of the conservation of functionally important residues and motifs, we were able to derive functional attributes for 6 of the ORFans. Our computational identification of important functional sites in these ORFans can be the basis for a subsequent experimental verification of our predictions. Further computational and experimental studies are required to elucidate the 3D structures and functions of the remaining Mimivirus ORFans.  相似文献   

8.
The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins.  相似文献   

9.
The FasD protein is essential for the biogenesis of 987P fimbriae of Escherichia coli. In this study, subcellular fractionation was used to demonstrate that FasD is an outer membrane protein. In addition, the accessibility of FasD to proteases established the presence of surface-exposed FasD domains on both sides of the outer membrane. The fasD gene was sequenced, and the deduced amino acid sequence was shown to share homologous domains with a family of outer membrane proteins from various fimbrial systems. Similar to porins, fimbrial outer membrane proteins are relatively polar, lack typical hydrophobic membrane-spanning domains, and posses secondary structures predicted to be rich in turns and amphipathic beta-sheets. On the basis of the experimental data and structural predictions, FasD is postulated to consist essentially of surface-exposed turns and loops and membrane-spanning interacting amphipathic beta-strands. In an attempt to test this prediction, the fasD gene was submitted to random in-frame linker insertion mutagenesis. Preliminary experiments demonstrated that it was possible to produce fasD mutants, whose products remain functional for fimbrial export and assembly. Subsequently, 11 fasD alleles, containing linker inserts encoding beta-turn-inducing residues, were shown to express functional proteins. The insertion sites were designated permissive sites. The inserts used are expected to be least detrimental to the function of FasD when they are inserted into surface-exposed domains not directly involved in fimbrial export. In contrast, FasD is not expected to accommodate such residues in its amphipathic beta-strands without being destabilized in the membrane and losing function. All permissive sites were sequenced and shown to be located in or one residue away from predicted turns. In contrast, 5 of 10 sequenced nonpermissive sites were mapped to predicted amphipathic beta-strands. These results are consistent with the structural predictions for FasD.  相似文献   

10.
Genome-scale datasets have been used extensively in model organisms to screen for specific candidates or to predict functions for uncharacterized genes. However, despite the availability of extensive knowledge in model organisms, the planning of genome-scale experiments in poorly studied species is still based on the intuition of experts or heuristic trials. We propose that computational and systematic approaches can be applied to drive the experiment planning process in poorly studied species based on available data and knowledge in closely related model organisms. In this paper, we suggest a computational strategy for recommending genome-scale experiments based on their capability to interrogate diverse biological processes to enable protein function assignment. To this end, we use the data-rich functional genomics compendium of the model organism to quantify the accuracy of each dataset in predicting each specific biological process and the overlap in such coverage between different datasets. Our approach uses an optimized combination of these quantifications to recommend an ordered list of experiments for accurately annotating most proteins in the poorly studied related organisms to most biological processes, as well as a set of experiments that target each specific biological process. The effectiveness of this experiment- planning system is demonstrated for two related yeast species: the model organism Saccharomyces cerevisiae and the comparatively poorly studied Saccharomyces bayanus. Our system recommended a set of S. bayanus experiments based on an S. cerevisiae microarray data compendium. In silico evaluations estimate that less than 10% of the experiments could achieve similar functional coverage to the whole microarray compendium. This estimation was confirmed by performing the recommended experiments in S. bayanus, therefore significantly reducing the labor devoted to characterize the poorly studied genome. This experiment-planning framework could readily be adapted to the design of other types of large-scale experiments as well as other groups of organisms.  相似文献   

11.

Background  

Much of thePlasmodium falciparumgenome encodes hypothetical proteins with limited homology to other organisms. A lack of robust tools for genetic manipulation of the parasite limits functional analysis of these hypothetical proteins and other aspects of thePlasmodiumgenome. Transposon mutagenesis has been used widely to identify gene functions in many organisms and would be extremely valuable for functional analysis of thePlasmodiumgenome.  相似文献   

12.
The PEDANT genome database (http://pedant.gsf.de) provides exhaustive automatic analysis of genomic sequences by a large variety of established bioinformatics tools through a comprehensive Web-based user interface. One hundred and seventy seven completely sequenced and unfinished genomes have been processed so far, including large eukaryotic genomes (mouse, human) published recently. In this contribution, we describe the current status of the PEDANT database and novel analytical features added to the PEDANT server in 2002. Those include: (i) integration with the BioRS data retrieval system which allows fast text queries, (ii) pre-computed sequence clusters in each complete genome, (iii) a comprehensive set of tools for genome comparison, including genome comparison tables and protein function prediction based on genomic context, and (iv) computation and visualization of protein-protein interaction (PPI) networks based on experimental data. The availability of functional and structural predictions for 650 000 genomic proteins in well organized form makes PEDANT a useful resource for both functional and structural genomics.  相似文献   

13.
14.
As a result of high‐throughput protein structure initiatives, over 14,400 protein structures have been solved by Structural Genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP‐Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP‐Func method to our previously reported method, Structurally Aligned Local Sites of Activity (SALSA), using the Ribulose Phosphate Binding Barrel (RPBB), 6‐Hairpin Glycosidase (6‐HG), and Concanavalin A‐like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP‐Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP‐Func methods to predict function. Forty‐one SG proteins in the RPBB superfamily, nine SG proteins in the 6‐HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community.  相似文献   

15.
Aliivibrio salmonicida causes "cold-water vibriosis" (or "Hitra disease") in fish, including marine-reared Atlantic salmon. During development of the disease the bacterium will encounter macrophages with antibacterial activities such as production of damaging reactive oxygen species (ROS). To defend itself the bacterium will presumably start producing detoxifying enzymes, reducing agents, and proteins involved in DNA and protein repair systems. Even though responses to oxidative stress are well studied for a few model bacteria, little work has been done in general to explain how important groups of pathogens, like members of the Vibrionaceae family, can survive at high levels of ROS. We have used bioinformatic tools and microarray to study how A. salmonicida responds to hydrogen peroxide (H(2)O(2)). First, we used the recently published genome sequence to predict potential binding sites for OxyR (H(2)O(2) response regulator). The computer-based search identified OxyR sites associated with 20 single genes and 8 operons, and these predictions were compared to experimental data from Northern blot analysis and microarray analysis. In general, OxyR binding site predictions and experimental results are in agreement. Up- and down regulated genes are distributed among all functional gene categories, but a striking number of ≥2 fold up regulated genes encode proteins involved in detoxification and DNA repair, are part of reduction systems, or are involved in carbon metabolism and regeneration of NADPH. Our predictions and -omics data corroborates well with findings from other model bacteria, but also suggest species-specific gene regulation.  相似文献   

16.
The proliferation of genome sequence data has led to the development of a number of tools and strategies that facilitate computational analysis. These methods include the identification of motif patterns, membership of the query sequences in family databases, metabolic pathway involvement and gene proximity. We re-examined the completely sequenced genome of Thermotoga maritima by employing the combined use of the above methods. By analyzing all 1877 proteins encoded in this genome, we identified 193 cases of conflicting annotations (10%), of which 164 are new function predictions and 29 are amendments of previously proposed assignments. These results suggest that the combined use of existing computational tools can resolve inconclusive sequence similarities and significantly improve the prediction of protein function from genome sequence.  相似文献   

17.
Filizola M  Weinstein H 《The FEBS journal》2005,272(12):2926-2938
To achieve a structural context for the analysis of G-protein coupled receptor (GPCR) oligomers, molecular modeling must be used to predict the corresponding interaction interfaces. The task is complicated by the paucity of detailed structural data at atomic resolution, and the large number of possible modes in which the bundles of seven transmembrane (TM) segments of the interacting GPCR monomers can be packed together into dimers and/or higher-order oligomers. Approaches and tools offered by bioinformatics can be used to reduce the complexity of this task and, combined with computational modeling, can serve to yield testable predictions for the structural properties of oligomers. Most of the bioinformatics methods take advantage of the evolutionary relation that exists among GPCRs, as expressed in their sequences and measurable in the common elements of their structural and functional features. These common elements are responsible for the presence of detectable patterns of motifs and correlated mutations evident from the alignment of the sequences of these complex biological systems. The decoding of these patterns in terms of structural and functional determinants can provide indications about the most likely interfaces of dimerization/oligomerization of GPCRs. We review here the main approaches from bioinformatics, enhanced by computational molecular modeling, that have been used to predict likely interfaces of dimerization/oligomerization of GPCRs, and compare results from their application to rhodopsin-like GPCRs. A compilation of the most frequently predicted GPCR oligomerization interfaces points to specific regions of TMs 4-6.  相似文献   

18.
Immunoprecipitation of RNA binding proteins (RBPs) after in vivo crosslinking, coupled with sequencing of associated RNA footprints (HITS-CLIP, CLIP-seq), is a method of choice for the identification of RNA targets and binding sites for RBPs. Compared with RNA-seq, CLIP-seq analysis is widely diverse and depending on the RBPs that are analyzed, the approaches vary significantly, necessitating the development of flexible and efficient informatics tools. In this study, we present CLIPSeqTools, a novel, highly flexible computational suite that can perform analysis from raw sequencing data with minimal user input. It contains a wide array of tools to provide an in-depth view of CLIP-seq data sets. It supports extensive customization and promotes improvization, a critical virtue, since CLIP-seq analysis is rarely well defined a priori. To highlight CLIPSeqTools capabilities, we used the suite to analyze Ago-miRNA HITS-CLIP data sets that we prepared from human brains.  相似文献   

19.
Helicobacter pylori is a flagellated and slow growing gram-negative bacterium that persistently infects about half of the entire world population. In present study, we examined the proteome of H. pylori strain HPAG1 for identification of key uncharacterized proteins toward their novel regulatory functions. The complete proteome of this strain consists of 1539 proteins, out of which 520 proteins are annotated as hypothetical. Based on the functional motifs in their primary sequences, we were able to classify 254 of these hypothetical proteins into 6 functional categories. Further, KEGG database was used to find the roles of these hypothetical proteins in several pathways and structural prediction was done by homology modeling methods. Thirty-three of these hypothetical proteins were found to have strong association in various pathways including signaling and defense mechanisms. We noted that 27 of these proteins are specific to H. pylori and can be selected for drug designing targets, based on their virulence and regulatory role. We were able to successfully model the 3D structures of three of these proteins: YP_626977.1, YP_626786.1, and YP_628146.1. The stability of these proteins was also validated using molecular dynamics simulations, and their possible role in the regulation of different pathways was explained. These novel annotations may contribute to the understanding of disease mechanism at molecular level and provide novel potential targets for designing new drugs against H. pylori strain HPAG1.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号