首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PARma is a complete data analysis software for AGO-PAR-CLIP experiments to identify target sites of microRNAs as well as the microRNA binding to these sites. It integrates specific characteristics of the experiments into a generative model. The model and a novel pattern discovery tool are iteratively applied to data to estimate seed activity probabilities, cluster confidence scores and to assign the most probable microRNA. Based on differential PAR-CLIP analysis and comparison to RIP-Chip data, we show that PARma is more accurate than existing approaches. PARma is available from http://www.bio.ifi.lmu.de/PARma  相似文献   

2.
Various biases affect high-throughput sequencing read counts. Contrary to the general assumption, we show that bias does not always cancel out when fold changes are computed and that bias affects more than 20% of genes that are called differentially regulated in RNA-seq experiments with drastic effects on subsequent biological interpretation. Here, we propose a novel approach to estimate fold changes. Our method is based on a probabilistic model that directly incorporates count ratios instead of read counts. It provides a theoretical foundation for pseudo-counts and can be used to estimate fold change credible intervals as well as normalization factors that outperform currently used normalization methods. We show that fold change estimates are significantly improved by our method by comparing RNA-seq derived fold changes to qPCR data from the MAQC/SEQC project as a reference and analyzing random barcoded sequencing data. Our software implementation is freely available from the project website http://www.bio.ifi.lmu.de/software/lfc.  相似文献   

3.
The last several years have seen the consolidation of high-throughput proteomics initiatives to identify and characterize protein interactions and macromolecular complexes in model organisms. In particular, more that 10,000 high-confidence protein-protein interactions have been described between the roughly 6,000 proteins encoded in the budding yeast genome (Saccharomyces cerevisiae). However, unfortunately, high-resolution three-dimensional structures are only available for less than one hundred of these interacting pairs. Here, we expand this structural information on yeast protein interactions by running the first-ever high-throughput docking experiment with some of the best state-of-the-art methodologies, according to our benchmarks. To increase the coverage of the interaction space, we also explore the possibility of using homology models of varying quality in the docking experiments, instead of experimental structures, and assess how it would affect the global performance of the methods. In total, we have applied the docking procedure to 217 experimental structures and 1,023 homology models, providing putative structural models for over 3,000 protein-protein interactions in the yeast interactome. Finally, we analyze in detail the structural models obtained for the interaction between SAM1-anthranilate synthase complex and the MET30-RNA polymerase III to illustrate how our predictions can be straightforwardly used by the scientific community. The results of our experiment will be integrated into the general 3D-Repertoire pipeline, a European initiative to solve the structures of as many as possible protein complexes in yeast at the best possible resolution. All docking results are available at http://gatealoy.pcb.ub.es/HT_docking/.  相似文献   

4.
5.
6.
The notion that sequence homology implies functional similarity underlies much of computational biology. In the case of protein-protein interactions, an interaction can be inferred between two proteins on the basis that sequence-similar proteins have been observed to interact. The use of transferred interactions is common, but the legitimacy of such inferred interactions is not clear. Here we investigate transferred interactions and whether data incompleteness explains the lack of evidence found for them. Using definitions of homology associated with functional annotation transfer, we estimate that conservation rates of interactions are low even after taking interactome incompleteness into account. For example, at a blastp -value threshold of , we estimate the conservation rate to be about between S. cerevisiae and H. sapiens. Our method also produces estimates of interactome sizes (which are similar to those previously proposed). Using our estimates of interaction conservation we estimate the rate at which protein-protein interactions are lost across species. To our knowledge, this is the first such study based on large-scale data. Previous work has suggested that interactions transferred within species are more reliable than interactions transferred across species. By controlling for factors that are specific to within-species interaction prediction, we propose that the transfer of interactions within species might be less reliable than transfers between species. Protein-protein interactions appear to be very rarely conserved unless very high sequence similarity is observed. Consequently, inferred interactions should be used with care.  相似文献   

7.
The Domesticated silkworm, Bombyx mori, an economically important insect has been used as a lepidopteran molecular model next only to Drosophila. Compared to the genomic information in silkworm, the protein-protein interaction data are limited. Therefore experimentally identified PPI maps from five model organisms such as E.coli, C.elegans, D.melanogaster, H. sapiens, S. cerevisiae were used to infer the PPI network of silkworm using the well-recognized Interlog based method. Among the 14623 silkworm proteins, 7736 protein-protein interaction pairs were predicted which include 2700 unique proteins of the silkworms. Using the iPfam interaction domains and the gene expression data, these predictions were validated. In that 625 PPI pairs of predicted network were associated with the iPfam domain-domain interactions and the random network has average of 9. In the gene expression method, the average PCC value of the predicted network and random network was 0.29 and 0.23100±0.00042 respectively. It reveals that the predicted PPI networks of silkworm are highly significant and reliable. This is the first PPI network for the silkworm which will provide a framework for deciphering the cellular processes governing key metabolic pathways in the silkworm, Bombyx mori and available at SilkPPI (http://210.212.197.30/SilkPPI/).  相似文献   

8.
RelEx--relation extraction using dependency parse trees   总被引:4,自引:0,他引:4  
MOTIVATION: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases. RESULTS: We developed RelEx, an approach for relation extraction from free text. It is based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with gene and protein relations and extracted approximately 150,000 relations with an estimated performance of both 80% precision and 80% recall. AVAILABILITY: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our website (http://www.bio.ifi.lmu.de/publications/RelEx/).  相似文献   

9.
We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.  相似文献   

10.
Most cellular processes are enabled by cohorts of interacting proteins that form dynamic networks within the plant proteome. The study of these networks can provide insight into protein function and provide new avenues for research. This article informs the plant science community of the currently available sources of protein interaction data and discusses how they can be useful to researchers. Using our recently curated IntAct Arabidopsis thaliana protein–protein interaction data set as an example, we discuss potentials and limitations of the plant interactomes generated to date. In addition, we present our efforts to add value to the interaction data by using them to seed a proteome-wide map of predicted protein subcellular locations.For well over two decades, plant scientists have studied protein interactions within plants using many different and evolving approaches. Their findings are represented by a large and growing corpus of peer-reviewed literature reflecting the increasing activity in this area of plant proteomic research. More recently, a number of predicted interactomes have been reported in plants and, while these predictions remain largely untested, they could act as a useful guide for future research. These studies have allowed researchers to better understand the function of protein complexes and to refine our understanding of protein function within the cell (Uhrig, 2006; Morsy et al., 2008). The extraction of protein interaction data from the literature and its standardized deposition and representation within publicly available databases remains a challenging task. Aggregating the data in databases allows researchers to leverage visualization, data mining, and integrative approaches to produce new insights that would be unachievable when the data are dispersed within largely inaccessible formats (Rodriguez et al., 2009).Currently, there are three databases that act as repositories of plant protein interaction data. These are IntAct (http://www.ebi.ac.uk/intact/; Aranda et al., 2010), The Arabidopsis Information Resource (TAIR; http://www.Arabidopsis.org/; Poole, 2007), and BioGRID (http://www.thebiogrid.org/; Breitkreutz et al., 2008). These databases curate experimentally established interactions available from the peer-reviewed literature (as opposed to predicted interactions, which will be discussed below). Each repository takes its own approach to the capture, storage, and representation of protein interaction data. TAIR focuses on Arabidopsis thaliana protein–protein interaction data exclusively; BioGRID currently focuses on the plant species Arabidopsis and rice (Oryza sativa), while IntAct attempts to capture protein interaction data from any plant species. Unlike the other repositories, IntAct follows a deep curation strategy that captures detailed experimental and biophysical details, such as binding regions and subcellular locations of interactions using controlled vocabularies (Aranda et al., 2010). While the majority of plant interaction data held by IntAct concern protein–protein interaction data in Arabidopsis, there is a small but growing content of interaction data relating to protein–DNA, protein–RNA, and protein–small molecule interactions, as well as interaction data from other plant species.Using the IntAct Arabidopsis data set as an example, we outline how the accumulating knowledge captured in these repositories can be used to further our understanding of the plant proteome. We compare the characteristics of predicted interactomes with the IntAct protein–protein interaction data set, which consists entirely of experimentally measured protein interactions, to gauge the predictive accuracy of these studies. Finally, we show how the IntAct data set can be used together with a recently developed Divide and Conquer k-Nearest Neighbors Method (DC-kNN; K. Lee et al., 2008) to predict the subcellular locations for most Arabidopsis proteins. This data set predicts high confidence subcellular locations for many unannotated Arabidopsis proteins and should act as a useful resource for future studies of protein function. Although this article focuses on the IntAct Arabidopsis protein–protein interaction data set, readers are also encouraged to explore the resources offered by our colleagues at TAIR and BioGRID.Each database employs its own system to report molecular interactions, as represented in the referenced source publications, and each avoids making judgments on interaction reliability or whether two participants in a complex have a direct interaction. Thus, the user should carefully filter these data sets for their specific purpose based on the full annotation of the data sets. In particular, the user should consider the experimental methods and independent observation of the same interaction in different publications when assessing the reliability and type of interaction of the proteins (e.g., direct or indirect). Confidence scoring schemes for interaction data are discussed widely in the literature (Yu and Finley, 2009).  相似文献   

11.
Many tumors contain mutations that confer defects in the DNA-damage response and genome stability. DNA-damaging agents are powerful therapeutic tools that can differentially kill cells with an impaired DNA-damage response. The response to DNA damage is complex and composed of a network of coordinated pathways, often with a degree of redundancy. Tumor-specific somatic mutations in DNA-damage response genes could be exploited by inhibiting the function of a second gene product to increase the sensitivity of tumor cells to a sublethal concentration of a DNA-damaging therapeutic agent, resulting in a class of conditional synthetic lethality we call synthetic cytotoxicity. We used the Saccharomyces cerevisiae nonessential gene-deletion collection to screen for synthetic cytotoxic interactions with camptothecin, a topoisomerase I inhibitor, and a null mutation in TEL1, the S. cerevisiae ortholog of the mammalian tumor-suppressor gene, ATM. We found and validated 14 synthetic cytotoxic interactions that define at least five epistasis groups. One class of synthetic cytotoxic interaction was due to telomere defects. We also found that at least one synthetic cytotoxic interaction was conserved in Caenorhabditis elegans. We have demonstrated that synthetic cytotoxicity could be a useful strategy for expanding the sensitivity of certain tumors to DNA-damaging therapeutics.  相似文献   

12.
Large quantity of reliable protein interaction data are available for model organisms in public depositories (e.g., MINT, DIP, HPRD, INTERACT). Most data correspond to experiments with the proteins of Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans, Escherichia coli and Mus musculus. For other important organisms the data availability is poor or non-existent. Here we present NASCENT, a completely automatic web-based tool and also a downloadable Java program, capable of modeling and generating protein interaction networks even for non-model organisms. The tool performs protein interaction network modeling through gene-name mapping, and outputs the resulting network in graphical form and also in computer-readable graph-forms, directly applicable by popular network modeling software.

Availability

http://nascent.pitgroup.org.  相似文献   

13.

Background

Host-microbe and microbe-microbe interactions are often governed by the complex exchange of metabolites. Such interactions play a key role in determining the way pathogenic and commensal species impact their host and in the assembly of complex microbial communities. Recently, several studies have demonstrated how such interactions are reflected in the organization of the metabolic networks of the interacting species, and introduced various graph theory-based methods to predict host-microbe and microbe-microbe interactions directly from network topology. Using these methods, such studies have revealed evolutionary and ecological processes that shape species interactions and community assembly, highlighting the potential of this reverse-ecology research paradigm.

Results

NetCooperate is a web-based tool and a software package for determining host-microbe and microbe-microbe cooperative potential. It specifically calculates two previously developed and validated metrics for species interaction: the Biosynthetic Support Score which quantifies the ability of a host species to supply the nutritional requirements of a parasitic or a commensal species, and the Metabolic Complementarity Index which quantifies the complementarity of a pair of microbial organisms’ niches. NetCooperate takes as input a pair of metabolic networks, and returns the pairwise metrics as well as a list of potential syntrophic metabolic compounds.

Conclusions

The Biosynthetic Support Score and Metabolic Complementarity Index provide insight into host-microbe and microbe-microbe metabolic interactions. NetCooperate determines these interaction indices from metabolic network topology, and can be used for small- or large-scale analyses. NetCooperate is provided as both a web-based tool and an open-source Python module; both are freely available online at http://elbo.gs.washington.edu/software_netcooperate.html.  相似文献   

14.
The primary goal of genome-wide association studies (GWAS) is to discover variants that could lead, in isolation or in combination, to a particular trait or disease. Standard approaches to GWAS, however, are usually based on univariate hypothesis tests and therefore can account neither for correlations due to linkage disequilibrium nor for combinations of several markers. To discover and leverage such potential multivariate interactions, we propose in this work an extension of the Random Forest algorithm tailored for structured GWAS data. In terms of risk prediction, we show empirically on several GWAS datasets that the proposed T-Trees method significantly outperforms both the original Random Forest algorithm and standard linear models, thereby suggesting the actual existence of multivariate non-linear effects due to the combinations of several SNPs. We also demonstrate that variable importances as derived from our method can help identify relevant loci. Finally, we highlight the strong impact that quality control procedures may have, both in terms of predictive power and loci identification. Variable importance results and T-Trees source code are all available at www.montefiore.ulg.ac.be/~botta/ttrees/ and github.com/0asa/TTree-source respectively.  相似文献   

15.
Biologists routinely use Microsoft Office applications for standard analysis tasks. Despite ubiquitous internet resources, information needed for everyday work is often not directly and seamlessly available. Here we describe a very simple and easily extendable mechanism using Web Services to enrich standard MS Office applications with internet resources. We demonstrate its capabilities by providing a Web-based thesaurus for biological objects, which maps names to database identifiers and vice versa via an appropriate synonym list. The client application ProTag makes these features available in MS Office applications using Smart Tags and Add-Ins. AVAILABILITY: http://services.bio.ifi.lmu.de/prothesaurus/  相似文献   

16.
Linking networks of molecular interactions to cellular functions and phenotypes is a key goal in systems biology. Here, we adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes. As a general association measure, SANTA can (i) functionally annotate experimentally derived networks using a collection of curated gene sets and (ii) annotate experimentally derived gene sets using a collection of curated networks, as well as (iii) prioritize genes for follow-up analyses. We exemplify the efficacy of SANTA in several case studies using the S. cerevisiae genetic interaction network and genome-wide RNAi screens in cancer cell lines. Our theory, simulations, and applications show that SANTA provides a principled statistical way to quantify the association between molecular networks and cellular functions and phenotypes. SANTA is available from http://bioconductor.org/packages/release/bioc/html/SANTA.html.  相似文献   

17.
SUMMARY: Recent advances in high-throughput technology have increased the quantity of available data on protein complexes and stimulated the development of many new prediction methods. In this article, we present ProCope, a Java software suite for the prediction and evaluation of protein complexes from affinity purification experiments which integrates the major methods for calculating interaction scores and predicting protein complexes published over the last years. Methods can be accessed via a graphical user interface, command line tools and a Java API. Using ProCope, existing algorithms can be applied quickly and reproducibly on new experimental results, individual steps of the different algorithms can be combined in new and innovative ways and new methods can be implemented and integrated in the existing prediction framework. AVAILABILITY: Source code and executables are available at http://www.bio.ifi.lmu.de/Complexes/ProCope/.  相似文献   

18.
A proteome-wide protein interaction map for Campylobacter jejuni   总被引:2,自引:0,他引:2  

Background

Data from large-scale protein interaction screens for humans and model eukaryotes have been invaluable for developing systems-level models of biological processes. Despite this value, only a limited amount of interaction data is available for prokaryotes. Here we report the systematic identification of protein interactions for the bacterium Campylobacter jejuni, a food-borne pathogen and a major cause of gastroenteritis worldwide.

Results

Using high-throughput yeast two-hybrid screens we detected and reproduced 11,687 interactions. The resulting interaction map includes 80% of the predicted C. jejuni NCTC11168 proteins and places a large number of poorly characterized proteins into networks that provide initial clues about their functions. We used the map to identify a number of conserved subnetworks by comparison to protein networks from Escherichia coli and Saccharomyces cerevisiae. We also demonstrate the value of the interactome data for mapping biological pathways by identifying the C. jejuni chemotaxis pathway. Finally, the interaction map also includes a large subnetwork of putative essential genes that may be used to identify potential new antimicrobial drug targets for C. jejuni and related organisms.

Conclusion

The C. jejuni protein interaction map is one of the most comprehensive yet determined for a free-living organism and nearly doubles the binary interactions available for the prokaryotic kingdom. This high level of coverage facilitates pathway mapping and function prediction for a large number of C. jejuni proteins as well as orthologous proteins from other organisms. The broad coverage also facilitates cross-species comparisons for the identification of evolutionarily conserved subnetworks of protein interactions.  相似文献   

19.
Accurate inference of molecular and functional interactions among genes, especially in multicellular organisms such as Drosophila, often requires statistical analysis of correlations not only between the magnitudes of gene expressions, but also between their temporal-spatial patterns. The ISH (in-situ-hybridization)-based gene expression micro-imaging technology offers an effective approach to perform large-scale spatial-temporal profiling of whole-body mRNA abundance. However, analytical tools for discovering gene interactions from such data remain an open challenge due to various reasons, including difficulties in extracting canonical representations of gene activities from images, and in inference of statistically meaningful networks from such representations. In this paper, we present GINI, a machine learning system for inferring gene interaction networks from Drosophila embryonic ISH images. GINI builds on a computer-vision-inspired vector-space representation of the spatial pattern of gene expression in ISH images, enabled by our recently developed system; and a new multi-instance-kernel algorithm that learns a sparse Markov network model, in which, every gene (i.e., node) in the network is represented by a vector-valued spatial pattern rather than a scalar-valued gene intensity as in conventional approaches such as a Gaussian graphical model. By capturing the notion of spatial similarity of gene expression, and at the same time properly taking into account the presence of multiple images per gene via multi-instance kernels, GINI is well-positioned to infer statistically sound, and biologically meaningful gene interaction networks from image data. Using both synthetic data and a small manually curated data set, we demonstrate the effectiveness of our approach in network building. Furthermore, we report results on a large publicly available collection of Drosophila embryonic ISH images from the Berkeley Drosophila Genome Project, where GINI makes novel and interesting predictions of gene interactions. Software for GINI is available at http://sailing.cs.cmu.edu/Drosophila_ISH_images/  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号