首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: To understand biological process, we must clarify how proteins interact with each other. However, since information about protein-protein interactions still exists primarily in the scientific literature, it is not accessible in a computer-readable format. Efficient processing of large amounts of interactions therefore needs an intelligent information extraction method. Our aim is to develop an efficient method for extracting information on protein-protein interaction from scientific literature. RESULTS: We present a method for extracting information on protein-protein interactions from the scientific literature. This method, which employs only a protein name dictionary, surface clues on word patterns and simple part-of-speech rules, achieved high recall and precision rates for yeast (recall = 86.8% and precision = 94.3%) and Escherichia coli (recall = 82.5% and precision = 93.5%). The result of extraction suggests that our method should be applicable to any species for which a protein name dictionary is constructed. AVAILABILITY: The program is available on request from the authors.  相似文献   

2.

Background

The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.

Results

Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

Conclusions

Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.  相似文献   

3.
Mutations help us to understand the molecular origins of diseases. Researchers, therefore, both publish and seek disease-relevant mutations in public databases and in scientific literature, e.g. Medline. The retrieval tends to be time-consuming and incomplete. Automated screening of the literature is more efficient. We developed extraction methods (called MEMA) that scan Medline abstracts for mutations. MEMA identified 24,351 singleton mutations in conjunction with a HUGO gene name out of 16,728 abstracts. From a sample of 100 abstracts we estimated the recall for the identification of mutation-gene pairs to 35% at a precision of 93%. Recall for the mutation detection alone was >67% with a precision rate of >96%. This shows that our system produces reliable data. The subset consisting of protein sequence mutations (PSMs) from MEMA was compared to the entries in OMIM (20,503 entries versus 6699, respectively). We found 1826 PSM-gene pairs to be in common to both datasets (cross-validated). This is 27% of all PSM-gene pairs in OMIM and 91% of those pairs from OMIM which co-occur in at least one Medline abstract. We conclude that Medline covers a large portion of the mutations known to OMIM. Another large portion could be artificially produced mutations from mutagenesis experiments. Access to the database of extracted mutation-gene pairs is available through the web pages of the EBI (refer to http://www.ebi. ac.uk/rebholz/index.html).  相似文献   

4.
MOTIVATION: Much research has been dedicated to large-scale protein interaction networks including the analysis of scale-free topologies, network modules and the relation of domain-domain to protein-protein interaction networks. Identifying locally significant proteins that mediate the function of modules is still an open problem. Method: We use a layered clustering algorithm for interaction networks, which groups proteins by the similarity of their direct neighborhoods. We identify locally significant proteins, called mediators, which link different clusters. We apply the algorithm to a yeast network. RESULTS: Clusters and mediators are organized in hierarchies, where clusters are mediated by and act as mediators for other clusters. We compare the clusters and mediators to known yeast complexes and find agreement with precision of 71% and recall of 61%. We analyzed the functions, processes and locations of mediators and clusters. We found that 55% of mediators to a cluster are enriched with a set of diverse processes and locations, often related to translocation of biomolecules. Additionally, 82% of clusters are enriched with one or more functions. The important role of mediators is further corroborated by a comparatively higher degree of conservation across genomes. We illustrate the above findings with an example of membrane protein translocation from the cytoplasm to the inner nuclear membrane. AVAILABILITY: All software is freely available under Supplementary information.  相似文献   

5.
This article reviews the most common methods used today for estimating divergence times and rates of molecular evolution. The methods are grouped into three main classes: (1) methods that use a molecular clock and one global rate of substitution, (2) methods that correct for rate heterogeneity, and (3) methods that try to incorporate rate heterogeneity. Additionally, links to the most important literature on molecular dating are given, including articles comparing the performance of different methods, papers that investigate problems related to taxon, gene and partition sampling, and literature discussing highly debated issues like calibration strategies and uncertainties, dating precision and the calculation of error estimates.  相似文献   

6.
《朊病毒》2013,7(3):201-210
The yeast Saccharomyces cerevisiae is a tractable model organism in which both to explore the molecular mechanisms underlying the generation of disease-associated protein misfolding and to map the cellular responses to potentially toxic misfolded proteins. Specific targets have included proteins which in certain disease states form amyloids and lead to neurodegeneration. Such studies are greatly facilitated by the extensive ‘toolbox’ available to the yeast researcher that provides a range of cell engineering options. Consequently, a number of assays at the cell and molecular level have been set up to report on specific protein misfolding events associated with endogenous or heterologous proteins. One major target is the mammalian prion protein PrP because we know little about what specific sequence and/or structural feature(s) of PrP are important for its conversion to the infectious prion form, PrPSc. Here, using a study of the expression in yeast of fusion proteins comprising the yeast prion protein Sup35 fused to various regions of mouse PrP protein, we show how PrP sequences can direct the formation of non-transmissible amyloids and focus in particular on the role of the mouse octarepeat region. Through this study we illustrate the benefits and limitations of yeast-based models for protein misfolding disorders.  相似文献   

7.
The yeast Saccharomyces cerevisiae is a tractable model organism in which both to explore the molecular mechanisms underlying the generation of disease-associated protein misfolding and to map the cellular responses to potentially toxic misfolded proteins. Specific targets have included proteins which in certain disease states form amyloids and lead to neurodegeneration. Such studies are greatly facilitated by the extensive ‘toolbox’ available to the yeast researcher that provides a range of cell engineering options. Consequently, a number of assays at the cell and molecular level have been set up to report on specific protein misfolding events associated with endogenous or heterologous proteins. One major target is the mammalian prion protein PrP because we know little about what specific sequence and/or structural feature(s) of PrP are important for its conversion to the infectious prion form, PrPSc. Here, using a study of the expression in yeast of fusion proteins comprising the yeast prion protein Sup35 fused to various regions of mouse PrP protein, we show how PrP sequences can direct the formation of non-transmissible amyloids and focus in particular on the role of the mouse octarepeat region. Through this study we illustrate the benefits and limitations of yeast-based models for protein misfolding disorders.  相似文献   

8.
Gold standard datasets on protein complexes are key to inferring and validating protein–protein interactions. Despite much progress in characterizing protein complexes in the yeast Saccharomyces cerevisiae, numerous researchers still use as reference the manually curated complexes catalogued by the Munich Information Center of Protein Sequences database. Although this catalogue has served the community extremely well, it no longer reflects the current state of knowledge. Here, we report two catalogues of yeast protein complexes as results of systematic curation efforts. The first one, denoted as CYC2008, is a comprehensive catalogue of 408 manually curated heteromeric protein complexes reliably backed by small-scale experiments reported in the current literature. This catalogue represents an up-to-date reference set for biologists interested in discovering protein interactions and protein complexes. The second catalogue, denoted as YHTP2008, comprises 400 high-throughput complexes annotated with current literature evidence. Among them, 262 correspond, at least partially, to CYC2008 complexes. Evidence for interacting subunits is collected for 68 complexes that have only partial or no overlap with CYC2008 complexes, whereas no literature evidence was found for 100 complexes. Some of these partially supported and as yet unsupported complexes may be interesting candidates for experimental follow up. Both catalogues are freely available at: http://wodaklab.org/cyc2008/.  相似文献   

9.
Functional foods are closely associated with claims on foods. There are two categories of claims on foods: nutrition claims and health claims. Health claims on (functional) foods must be scientifically substantiated. In December 2006, the European Union published its Regulation 1924/2006 on nutrition and health claims made on foods. As concerns scientific evaluation, the EU-project PASSCLAIM resulted in a set of criteria for the scientific substantiation of health claims on foods. The European Food Safety Authority provides the scientific advise to the European Commission for health claims submitted under Regulation 1924/2006 and has hitherto published several hundreds of opinions on health claims, part of which are positive, part which are negative and a few with insufficient evidence. Antioxidant claims have been approved for the general function of vitamins but not for direct health effects in humans. Another issue with claims is consumer understanding. Consumers can hardly distinguish between graded levels of evidence, and they do make only little or no distinction between nutrition and health claims. Consumers understand nutrition and health claims different from scientists and regulators. Therefore, innovation in industry can readily proceed via approved nutrition claims and approved health claims. The market and the shelves in the stores will not be empty; rather they will look different in the years to come.  相似文献   

10.
We introduce a new algorithm, called ClusFCM, which combines techniques of clustering and fuzzy cognitive maps (FCM) for prediction of protein functions. ClusFCM takes advantage of protein homologies and protein interaction network topology to improve low recall predictions associated with existing prediction methods. ClusFCM exploits the fact that proteins of known function tend to cluster together and deduce functions not only through their direct interaction with other proteins, but also from other proteins in the network. We use ClusFCM to annotate protein functions for Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), and Drosophila melanogaster (fly) using protein-protein interaction data from the General Repository for Interaction Datasets (GRID) database and functional labels from Gene Ontology (GO) terms. The algorithm's performance is compared with four state-of-the-art methods for function prediction--Majority, chi(2) statistics, Markov random field (MRF), and FunctionalFlow--using measures of Matthews correlation coefficient, harmonic mean, and area under the receiver operating characteristic (ROC) curves. The results indicate that ClusFCM predicts protein functions with high recall while not lowering precision. Supplementary information is available at http://www.egr.vcu.edu/cs/dmb/ClusFCM/.  相似文献   

11.
12.
The publication of unfounded health claims on small molecules in peer-reviewed scientific literature is a problem that requires attention. It undermines the evidence-based decision making processes of modern-day society, weakens the credibility of the scientific enterprise, and diverts resources to futile research efforts. In the present essay we discuss some human and scientific causes behind the issue. We propose a number of actions to be taken up by scientists, referees and publishers. One particularly important factor is the issue of enigmatic compound behavior in biological assays. We therefore also introduce the idea of biological filters, a pattern recognition method to triage enigmatic compounds into valuable hits and false positives, based on the entirety of their biological effects in cell-based systems.  相似文献   

13.
High-throughput assays, such as RNA-seq, to detect differential abundance are widely used. Variable performance across statistical tests, normalizations, and conditions leads to resource wastage and reduced sensitivity. EDDA represents a first, general design tool for RNA-seq, Nanostring, and metagenomic analysis, that rationally selects tests, predicts performance, and plans experiments to minimize resource wastage. Case studies highlight EDDA’s ability to model single-cell RNA-seq, suggesting ways to reduce sequencing costs up to five-fold and improving metagenomic biomarker detection through improved test selection. EDDA’s novel mode-based normalization for detecting differential abundance improves robustness by 10% to 20% and precision by up to 140%.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0527-7) contains supplementary material, which is available to authorized users.  相似文献   

14.
MOTIVATION:The development of experimental methods for genome scale analysis of molecular interaction networks has made possible new approaches to inferring protein function. This paper describes a method of assigning functions based on a probabilistic analysis of graph neighborhoods in a protein-protein interaction network. The method exploits the fact that graph neighbors are more likely to share functions than nodes which are not neighbors. A binomial model of local neighbor function labeling probability is combined with a Markov random field propagation algorithm to assign function probabilities for proteins in the network. RESULTS: We applied the method to a protein-protein interaction dataset for the yeast Saccharomyces cerevisiae using the Gene Ontology (GO) terms as function labels. The method reconstructed known GO term assignments with high precision, and produced putative GO assignments to 320 proteins that currently lack GO annotation, which represents about 10% of the unlabeled proteins in S. cerevisiae.  相似文献   

15.
Molecular biological methods that use antibodies and nucleic acids to detect specific foodborne bacterial pathogens were scarcely known a decade and a half ago. Few scientists could have predicted that these tools of basic research would come to dominate the field of food diagnostics. Today, a large number of cleverly designed assay formats using these technologies are available commercially for the detection in foods of practically all major established pathogens and toxins, as well as of many emerging pathogens. These tests range from very simple antibody-bound latex agglutination assays to very sophisticated DNA amplification methods. Although molecular biological assays are more specific, sensitive, and faster than conventional (often cultural) microbiological methods, the complexities of food matrices continue to offer unique challenges that may preclude the direct application of these molecular biological methods. Consequently, a short cultural enrichment period is still required for food samples prior to analysis with these assays. The greater detection sensitivity of molecular biological methods may also affect existing microbiological specifications for foods; this undoubtedly will have repercussions on the regulatory agencies, food manufacturers, and also consumers. The US government has the right to retain a nonexclusive royalty-free license in and to any copyright covering this article. Use of trade names is for identification only and does not imply an endorsement by the US FDA.  相似文献   

16.
AIMS: The aim of the work was to apply PCR-temperature gradient gel electrophoresis (PCR-TGGE) and restriction enzyme analysis (RE) assays to identify commercially available starters of Saccharomyces cerevisiae sensu stricto complex. METHODS AND RESULTS: To characterize an analysed pool of 62 active dry yeasts of different brands used in wine fermentation practices, classical microbiological tests were also performed as well as evaluation of contamination with lactic acid bacteria and non-Saccharomyces yeasts. PCR-TGGE and RE were used in order to provide fast and reliable methods to identify and differentiate enological yeasts. Proposed molecular methods enabled to identify particular strains within 36 h after colony isolation and directly from dry yeast suspension. CONCLUSIONS: The methods are highly recommended to obtain reliable results on yeast strain differentiation in a significantly shorter time if compared to classical fermentation tests. SIGNIFICANCE AND IMPACT OF THE STUDY: The obtaining of yeast strain differentiation in a short time and without plating is a good tool for a rapid discrimination among enological strains used as starters in enological practices.  相似文献   

17.
MOTIVATION: Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts. RESULTS: We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%. AVAILABILITY: The program is available on request from the authors.  相似文献   

18.
Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials.  相似文献   

19.
Although figures in scientific articles have high information content and concisely communicate many key research findings, they are currently under utilized by literature search and retrieval systems. Many systems ignore figures, and those that do not typically only consider caption text. This study describes and evaluates a fully automated approach for associating figures in the body of a biomedical article with sentences in its abstract. We use supervised methods to learn probabilistic language models, hidden Markov models, and conditional random fields for predicting associations between abstract sentences and figures. Three kinds of evidence are used: text in abstract sentences and figures, relative positions of sentences and figures, and the patterns of sentence/figure associations across an article. Each information source is shown to have predictive value, and models that use all kinds of evidence are more accurate than models that do not. Our most accurate method has an F1-score of 69% on a cross-validation experiment, is competitive with the accuracy of human experts, has significantly better predictive accuracy than state-of-the-art methods and enables users to access figures associated with an abstract sentence with an average of 1.82 fewer mouse clicks. A user evaluation shows that human users find our system beneficial. The system is available at http://FigureItOut.askHERMES.org.  相似文献   

20.
A common assay to measure yeast metabolic activity in biofilms is based on the reduction of the tetrazolium salt XTT {2,3-bis (2-methoxy-4-nitro-5-sulfophenyl)-5-[(phenylamino) carbonyl]-2H-tetrazolium hydroxide} to a colored formazan. However, a recent report, also confirmed by our own findings about the shortcomings of the chromogenic XTT assay, has prompted us to investigate alternative methods for yeast biomass quantification. To this end, two fluorogenic assays using fluorescein diacetate (FDA) and SYTO 9 as well as the XTT assay were comparatively evaluated with regard to the linear range of Candida albicans and Candida parapsilosis cell number-response curves, precision and intra- and interspecies variability. Reading of fluorescence and absorbance was carried out in a multilabel microtiter plate reader. All three assays were adequate for the determination of planktonic yeast biomass, but the FDA and SYTO 9 assays present practical advantages. When applied to the quantification of yeast biofilm biomass obtained in the CDC biofilm reactor, the FDA assay proved superior.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号