首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Characterizing gene function is one of the major challenging tasks in the post-genomic era. To address this challenge, we have developed GeneFAS (Gene Function Annotation System), a new integrated probabilistic method for cellular function prediction by combining information from protein-protein interactions, protein complexes, microarray gene expression profiles, and annotations of known proteins through an integrative statistical model. Our approach is based on a novel assessment for the relationship between (1) the interaction/correlation of two proteins' high-throughput data and (2) their functional relationship in terms of their Gene Ontology (GO) hierarchy. We have developed a Web server for the predictions. We have applied our method to yeast Saccharomyces cerevisiae and predicted functions for 1548 out of 2472 unannotated proteins.  相似文献   

3.
JIGSAW: integration of multiple sources of evidence for gene prediction   总被引:3,自引:0,他引:3  
MOTIVATION: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models. RESULTS: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods. AVAILABILITY: JIGSAW is available as an open source software package at http://cbcb.umd.edu/software/jigsaw.  相似文献   

4.
Autophagy is a self-degradative process that is crucial for maintaining cellular homeostasis by removing damaged cytoplasmic components and recycling nutrients. Such an evolutionary conserved proteolysis process is regulated by the autophagy-related (Atg) proteins. The incomplete understanding of plant autophagy proteome and the importance of a proteome-wide understanding of the autophagy pathway prompted us to predict Atg proteins and regulators in Arabidopsis. Here, we developed a systems-level algorithm to identify autophagy-related modules (ARMs) based on protein subcellular localization, protein–protein interactions, and known Atg proteins. This generates a detailed landscape of the autophagic modules in Arabidopsis. We found that the newly identified genes in each ARM tend to be upregulated and coexpressed during the senescence stage of Arabidopsis. We also demonstrated that the Golgi apparatus ARM, ARM13, functions in the autophagy process by module clustering and functional analysis. To verify the in silico analysis, the Atg candidates in ARM13 that are functionally similar to the core Atg proteins were selected for experimental validation. Interestingly, two of the previously uncharacterized proteins identified from the ARM analysis, AGD1 and Sec14, exhibited bona fide association with the autophagy protein complex in plant cells, which provides evidence for a cross-talk between intracellular pathways and autophagy. Thus, the computational framework has facilitated the identification and characterization of plant-specific autophagy-related proteins and novel autophagy proteins/regulators in higher eukaryotes.  相似文献   

5.
Attention deficit hyperactivity disorder (ADHD) is a common, highly heritable psychiatric disorder characterized by hyperactivity, inattention and increased impulsivity. In recent years, a large number of genetic studies for ADHD have been published and related genetic data has been accumulated dramatically. To provide researchers a comprehensive ADHD genetic resource, we previously developed the first genetic database for ADHD (ADHDgene). The abundant genetic data provides novel candidates for further study. Meanwhile, it also brings new challenge for selecting promising candidate genes for replication and verification research. In this study, we surveyed the computational tools for candidate gene prioritization and selected five tools, which integrate multiple data sources for gene prioritization, to prioritize ADHD candidate genes in ADHDgene. The prioritization analysis resulted in 16 prioritized candidate genes, which are mainly involved in several major neurotransmitter systems or in nervous system development pathways. Among these genes, nervous system development related genes, especially SNAP25, STX1A and the gene-gene interactions related with each of them deserve further investigations. Our results may provide new insight for further verification study and facilitate the exploration of pathogenesis mechanism of ADHD.  相似文献   

6.
Interpreting genome sequences requires the functional analysis of thousands of predicted proteins, many of which are uncharacterized and without obvious homologs. To assess whether the roles of large sets of uncharacterized genes can be assigned by targeted application of a suite of technologies, we used four complementary protein-based methods to analyze a set of 100 uncharacterized but essential open reading frames (ORFs) of the yeast Saccharomyces cerevisiae. These proteins were subjected to affinity purification and mass spectrometry analysis to identify copurifying proteins, two-hybrid analysis to identify interacting proteins, fluorescence microscopy to localize the proteins, and structure prediction methodology to predict structural domains or identify remote homologies. Integration of the data assigned function to 48 ORFs using at least two of the Gene Ontology (GO) categories of biological process, molecular function, and cellular component; 77 ORFs were annotated by at least one method. This combination of technologies, coupled with annotation using GO, is a powerful approach to classifying genes.  相似文献   

7.
Chen Y  Wang W  Zhou Y  Shields R  Chanda SK  Elston RC  Li J 《PloS one》2011,6(6):e21137
Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.  相似文献   

8.
While it has been established that microRNAs (miRNAs) play key roles throughout development and are dysregulated in many human pathologies, the specific processes and pathways regulated by individual miRNAs are mostly unknown. Here, we use computational target predictions in order to automatically infer the processes affected by human miRNAs. Our approach improves upon standard statistical tools by addressing specific characteristics of miRNA regulation. Our analysis is based on a novel compendium of experimentally verified miRNA-pathway and miRNA-process associations that we constructed, which can be a useful resource by itself. Our method also predicts novel miRNA-regulated pathways, refines the annotation of miRNAs for which only crude functions are known, and assigns differential functions to miRNAs with closely related sequences. Applying our approach to groups of co-expressed genes allows us to identify miRNAs and genomic miRNA clusters with functional importance in specific stages of early human development. A full list of the predicted mRNA functions is available at http://acgt.cs.tau.ac.il/fame/.  相似文献   

9.
MOTIVATION: With the increasing availability of diverse biological information, protein function prediction approaches have converged towards integration of heterogeneous data. Many adapted existing techniques, such as machine-learning and probabilistic methods, which have proven successful on specific data types. However, the impact of these approaches is hindered by a couple of factors. First, there is little comparison between existing approaches. This is in part due to a divergence in the focus adopted by different works, which makes comparison difficult or even fuzzy. Second, there seems to be over-emphasis on the use of computationally demanding machine-learning methods, which runs counter to the surge in biological data. Analogous to the success of BLAST for sequence homology search, we believe that the ability to tap escalating quantity, quality and diversity of biological data is crucial to the success of automated function prediction as a useful instrument for the advancement of proteomic research. We address these problems by: (1) providing useful comparison between some prominent methods; (2) proposing Integrated Weighted Averaging (IWA)--a scalable, efficient and flexible function prediction framework that integrates diverse information using simple weighting strategies and a local prediction method. The simplicity of the approach makes it possible to make predictions based on on-the-fly information fusion. RESULTS: In addition to its greater efficiency, IWA performs exceptionally well against existing approaches. In the presence of cross-genome information, which is overwhelming for existing approaches, IWA makes even better predictions. We also demonstrate the significance of appropriate weighting strategies in data integration.  相似文献   

10.
11.

Background  

The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes.  相似文献   

12.
Context-sensitive data integration and prediction of biological networks   总被引:4,自引:0,他引:4  
MOTIVATION: Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context. RESULTS: We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios. AVAILABILITY: A software implementation of our approach is available on request from the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://avis.princeton.edu/contextPIXIE/  相似文献   

13.
Increasing evidence demonstrates the importance of long coiled-coil proteins for the spatial organization of cellular processes. Although several protein classes with long coiled-coil domains have been studied in animals and yeast, our knowledge about plant long coiled-coil proteins is very limited. The repeat nature of the coiled-coil sequence motif often prevents the simple identification of homologs of animal coiled-coil proteins by generic sequence similarity searches. As a consequence, counterparts of many animal proteins with long coiled-coil domains, like lamins, golgins, or microtubule organization center components, have not been identified yet in plants. Here, all Arabidopsis proteins predicted to contain long stretches of coiled-coil domains were identified by applying the algorithm MultiCoil to a genome-wide screen. A searchable protein database, ARABI-COIL (http://www.coiled-coil.org/arabidopsis), was established that integrates information on number, size, and position of predicted coiled-coil domains with subcellular localization signals, transmembrane domains, and available functional annotations. ARABI-COIL serves as a tool to sort and browse Arabidopsis long coiled-coil proteins to facilitate the identification and selection of candidate proteins of potential interest for specific research areas. Using the database, candidate proteins were identified for Arabidopsis membrane-bound, nuclear, and organellar long coiled-coil proteins.  相似文献   

14.
Liu Z  Ma Q  Cao J  Gao X  Ren J  Xue Y 《Molecular bioSystems》2011,7(10):2737-2740
Recent experiments revealed the prokaryotic ubiquitin-like protein (PUP) to be a signal for the selective degradation of proteins in Mycobacterium tuberculosis (Mtb). By covalently conjugating the PUP, pupylation functions as a critical post-translational modification (PTM) conserved in actinomycetes. Here, we designed a novel computational tool of GPS-PUP for the prediction of pupylation sites, which was shown to have a promising performance. From small-scale and large-scale studies we collected 238 potentially pupylated substrates for which the exact pupylation sites were still not determined. As an example application, we predicted ~85% of these proteins with at least one potential pupylation site. Furthermore, through functional analysis, we observed that pupylation can target various substrates so as to regulate a broad array of biological processes, such as the response to stress, sulfate and proton transport, and metabolism. The prediction and analysis results prove to be useful for further experimental investigation. The GPS-PUP 1.0 is freely available at: .  相似文献   

15.
16.
Lipids are important compounds for human physiology and as renewable resources for fuels and chemicals. In lipid research, there is a big gap between the currently available pathway-level representations of lipids and lipid structure databases in which the number of compounds is expanding rapidly with high-throughput mass spectrometry methods.In this work, we introduce a computational approach to bridge this gap by making associations between metabolic pathways and the lipid structures discovered increasingly thorough lipidomics studies. Our approach, called NICELips (Network Integrated Computational Explorer for Lipidomics), is based on the formulation of generalized enzymatic reaction rules for lipid metabolism, and it employs the generalized rules to postulate novel pathways of lipid metabolism. It further integrates all discovered lipids in biological networks of enzymatic reactions that consist their biosynthesis and biodegradation pathways.We illustrate the utility of our approach through a case study of bis(monoacylglycero)phosphate (BMP), a biologically important glycerophospholipid with immature synthesis and catabolic route(s). Using NICELips, we were able to propose various synthesis and degradation pathways for this compound and several other lipids with unknown metabolism like BMP, and in addition several alternative novel biosynthesis and biodegradation pathways for lipids with known metabolism. NICELips has potential applications in designing therapeutic interventions for lipid-associated disorders and in the metabolic engineering of model organisms for improving the biobased production of lipid-derived fuels and chemicals.  相似文献   

17.
MOTIVATION: Insertion mutagenesis, using transgenes or endogenous transposons, is a popular method for generating null mutations (knockouts) in model organisms. Insertions are mapped to specific genes by amplifying (via TAIL-PCR) and sequencing genomic regions flanking the inserted DNA. The presence of multiple TAIL-PCR templates in one sequencing reaction results in chimeric sequence of intermittently low quality. Standard processing of this sequence by applying Phred quality requirements results in loss of informative sequence, whereas not trimming low-quality sequence causes inclusion of low-complexity homopolymers from the ends of sequence runs. Accurate mapping of the flanking sequences is complicated by the presence of gene families. RESULTS: Methods for extracting informative regions from sequence traces obtained by sequencing multiple TAIL-PCR fragments in a single reaction are described. The completely sequenced Arabidopsis genome was used to identify informative TAIL-PCR sequence regions. Methods were devised to define and select high quality matches and precisely map each insert to the correct genome location. These methods were used to analyze sequence of TAIL-PCR-amplified flanking regions of the inserts from individual plants in a T-DNA-mutagenized population of Arabidopsis thaliana, and are applicable to similar situations where a reference genome can be used to extract information from poor-quality sequence.  相似文献   

18.
19.
Liu Z  Cao J  Ma Q  Gao X  Ren J  Xue Y 《Molecular bioSystems》2011,7(4):1197-1204
The last decade has witnessed rapid progress in the identification of protein tyrosine nitration (PTN), which is an essential and ubiquitous post-translational modification (PTM) that plays a variety of important roles in both physiological and pathological processes, such as the immune response, cell death, aging and neurodegeneration. Identification of site-specific nitrated substrates is fundamental for understanding the molecular mechanisms and biological functions of PTN. In contrast with labor-intensive and time-consuming experimental approaches, here we report the development of the novel software package GPS-YNO2 to predict PTN sites. The software demonstrated a promising accuracy of 76.51%, a sensitivity of 50.09% and a specificity of 80.18% from the leave-one-out validation. As an example application, we predicted potential PTN sites for hundreds of nitrated substrates which had been experimentally detected in small-scale or large-scale studies, even though the actual nitration sites had still not been determined. Through a statistical functional comparison with the nitric oxide (NO) dependent reversible modification of S-nitrosylation, we observed that PTN prefers to attack certain fundamental biological processes and functions. These prediction and analysis results might be helpful for further experimental investigation. Finally, the online service and local packages of GPS-YNO2 1.0 were implemented in JAVA and freely available at: .  相似文献   

20.
Protein function prediction: towards integration of similarity metrics   总被引:1,自引:0,他引:1  
Genomic centers discover increasingly many protein sequences and structures, but not necessarily their full biological functions. Thus, currently, less than one percent of proteins have experimentally verified biochemical activities. To fill this gap, function prediction algorithms apply metrics of similarity between proteins on the premise that those sufficiently alike in sequence, or structure, will perform identical functions. Although high sensitivity is elusive, network analyses that integrate these metrics together hold the promise of rapid gains in function prediction specificity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号