首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
SUMMARY: BioIE is a rule-based system that extracts informative sentences relating to protein families, their structures, functions and diseases from the biomedical literaturE. Based on manual definition of templates and rules, it aims at precise sentence extraction rather than wide recall. After uploading source text or retrieving abstracts from MEDLINE, users can extract sentences based on predefined or user-defined template categories. BioIE also provides a brief insight into the syntactic and semantic context of the source-text by looking at word, N-gram and MeSH-term distributions. Important Applications of BioIE are in, for example, annotation of microarray data and of protein databases. AVAILABILITY: http://umber.sbs.man.ac.uk/dbbrowser/bioie/  相似文献   

2.
MOTIVATION: Contrasts are useful conceptual vehicles for learning processes and exploratory research of the unknown. For example, contrastive information between proteins can reveal what similarities, divergences and relations there are of the two proteins, leading to invaluable insights for better understanding about the proteins. Such contrastive information are found to be reported in the biomedical literature. However, there have been no reported attempts in current biomedical text mining work that systematically extract and present such useful contrastive information from the literature for exploitation. RESULTS: Our BioContrasts system extracts protein-protein contrastive information from MEDLINE abstracts and presents the information to biologists in a web-application for exploitation. Contrastive information are identified in the text abstracts with contrastive negation patterns such as 'A but not B'. A total of 799 169 pairs of contrastive expressions were successfully extracted from 2.5 million MEDLINE abstracts. Using grounding of contrastive protein names to Swiss-Prot entries, we were able to produce 41 471 pieces of contrasts between Swiss-Prot protein entries. These contrastive pieces of information are then presented via a user-friendly interactive web portal that can be exploited for applications such as the refinement of biological pathways. AVAILABILITY: BioContrasts can be accessed at http://biocontrasts.i2r.a-star.edu.sg. It is also mirrored at http://biocontrasts.biopathway.org. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.  相似文献   

3.

Background:

Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches.

Results:

During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F1 score of28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system.

Conclusion:

We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.
  相似文献   

4.
Filamentous fungi and yeast from the genera Saccharomyces, Penicillium, Aspergillus, and Fusarium are well known for their impact on our life as pathogens, involved in food spoilage by degradation or toxin contamination, and also for their wide use in biotechnology for the production of beverages, chemicals, pharmaceuticals, and enzymes. The genomes of these eukaryotic micro-organisms range from about 6000 genes in yeasts (S. cerevisiae) to more than 10,000 genes in filamentous fungi (Aspergillus sp.). Yeast and filamentous fungi are expected to share much of their primary metabolism; therefore much understanding of the central metabolism and regulation in less-studied filamentous fungi can be learned from comparative metabolite profiling and metabolomics of yeast and filamentous fungi. Filamentous fungi also have a very active and diverse secondary metabolism in which many of the additional genes present in fungi, compared with yeast, are likely to be involved. Although the 'blueprint' of a given organism is represented by the genome, its behaviour is expressed as its phenotype, i.e. growth characteristics, cell differentiation, response to the environment, the production of secondary metabolites and enzymes. Therefore the profile of (secondary) metabolites--fungal chemodiversity--is important for functional genomics and in the search for new compounds that may serve as biotechnology products. Fungal chemodiversity is, however, equally efficient for identification and classification of fungi, and hence a powerful tool in fungal taxonomy. In this paper, the use of metabolite profiling is discussed for the identification and classification of yeasts and filamentous fungi, functional analysis or discovery by integration of high performance analytical methodology, efficient data handling techniques and core concepts of species, and intelligent screening. One very efficient approach is direct infusion Mass Spectrometry (diMS) integrated with automated data handling, but a full metabolic picture requires the combination of several different analytical techniques.  相似文献   

5.
Experimental samples are valuable and can represent a significant investment in time and resources. It is highly desirable at times to obtain as much information as possible from a single sample. This is especially relevant for systems biology approaches in which several ‘omics platforms are studied simultaneously. Unfortunately, each platform has a particular extraction methodology which increases sample number and sample volume requirements when multiple ‘omics are analyzed. We evaluated the integration of a yeast extraction method; specifically we explored whether fractions from a single metabolite extraction could be apportioned to multiple downstream ‘omics analytical platforms. In addition, we examined how variations to a chloroform/methanol yeast metabolite extraction regime influence metabolite recoveries. We show that protein suitable for proteomic analysis can be recovered from a metabolite extraction and that recovery of lipids, while reproducible, are not wholly quantitative. Higher quenching solution temperatures (?30 °C) can be used without significant leakage of intracellular metabolites when lower fermentation temperatures (20 °C) are employed. However, extended residence time in quenching solution, in combination with vigorous washing of quenched cell pellets, leads to extensive leakage of intracellular metabolites. Finally, there is minimal difference in metabolite amounts obtained when metabolite extractions are performed at 4 °C compared to extractions at ?20 °C. The evaluated extraction method delivers material suitable for metabolomic and proteomic analyses from the same sample preparation.  相似文献   

6.
7.
The article deals with the development of a new method for the extraction of intracellular glycolytic metabolites from bacterial cells. The study has been made on the culture of E. coli B/r CSH. In accordance with this method, the same bacterial filter is used for both filtration (the removal of the culture fluid) and the extraction of low-molecular components of the cells with perchloric acid. The advantage of this method is the absence of unnecessary operations due to the use of a filter installation designed by the author. Quantitatively, this method yields better and reproducible results. The filtration capacity of different types of filters has been analyzed. The optimal time for the extraction of low-molecular cell components has been determined. A change in the concentration of pyruvate in the process of the cellular cycle of E. coli synchronous culture grown in the presence of glucose has been shown to occur. The newly developed method of extraction can be used not only for E. coli, but also for cells of other types.  相似文献   

8.
MOTIVATION: The rate at which gene-related findings appear in the scientific literature makes it difficult if not impossible for biomedical scientists to keep fully informed and up to date. The importance of these findings argues for the development of automated methods that can find, extract and summarize this information. This article reports on methods for determining the molecular function claims that are being made in a scientific article, specifically those that are backed by experimental evidence. RESULTS: The most significant result is that for molecular function claims based on direct assays, our methods achieved recall of 70.7% and precision of 65.7%. Furthermore, our methods correctly identified in the text 44.6% of the specific molecular function claims backed up by direct assays, but with a precision of only 0.92%, a disappointing outcome that led to an examination of the different kinds of errors. These results were based on an analysis of 1823 articles from the literature of Saccharomyces cerevisiae (budding yeast). AVAILABILITY: The annotation files for S.cerevisiae are available from ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/gene_association.sgd.gz. The draft protocol vocabulary is available by request from the first author.  相似文献   

9.
MBA: a literature mining system for extracting biomedical abbreviations   总被引:1,自引:0,他引:1  

Background  

The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from the use of a great deal of abbreviations. Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis. Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based. State of the art methods either focus exclusively on acronym-type abbreviations, or could not recognize rare abbreviations. We propose a systematic method to extract abbreviations effectively. At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method for the latter.  相似文献   

10.
11.
The biological mechanisms that direct the generation and accumulationof the vast diversity of metabolites observed in the plant kingdomare not fully understood. An exciting and promising approachto understand these mechanisms is described in the paper byXie et al. (2009). The authors have coupled state of the artmetabolomic analyses with novel bioinformatic techniques toidentify apparent ‘metabolic modules’ in turmeric(Curcuma longa) rhizomes. A metabolic module is defined as agroup of co-regulated metabolites and this approach elegantlyrepresents a basic innovative and practical attempt to understandand predict metabolic pathways using detailed bioinformaticsdata mining following careful and well-documented GC-MS andLC-MS  相似文献   

12.
13.
A metabolome pipeline: from concept to data to knowledge   总被引:5,自引:3,他引:5  
Metabolomics, like other omics methods, produces huge datasets of biological variables, often accompanied by the necessary metadata. However, regardless of the form in which these are produced they are merely the ground substance for assisting us in answering biological questions. In this short tutorial review and position paper we seek to set out some of the elements of “best practice” in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge. Many of these steps involve the solution of what amount to combinatorial optimization problems, and methods developed for these, especially those based on evolutionary computing, are proving valuable. This is done in terms of a “pipeline” that goes from the design of good experiments, through instrumental optimization, data storage and manipulation, the chemometric data processing methods in common use, and the necessary means of validation and cross-validation for giving conclusions that are credible and likely to be robust when applied in comparable circumstances to samples not used in their generation.This revised version was published online in June 2005. The previous version did not contain colour images.  相似文献   

14.
Concentrations of intermediary metabolites in yeast   总被引:15,自引:0,他引:15  
J M Gancedo  C Gancedo 《Biochimie》1973,55(2):205-211
  相似文献   

15.
16.
The SFF file format produced by Roche's 454 sequencing technology is a compact, binary format that contains the flow values that are used for base and quality calling of the reads. Applications, e.g. in metagenomics, often depend on accurate sequence information, and access to flow values is important to estimate the probability of errors. Unfortunately, the programs supplied by Roche for accessing this information are not publicly available. Flower is a program that can extract the information contained in SFF files, and convert it to various textual output formats. AVAILABILITY: Flower is freely available under the General Public License.  相似文献   

17.
Currently, literature is integrated in systems biology studies in three ways. Hand-curated pathways have been sufficient for assembling models in numerous studies. Second, literature is frequently accessed in a derived form, such as the concepts represented by the Medical Subject Headings (MeSH) and Gene Ontologies (GO), or functional relationships captured in protein-protein interaction (PPI) databases; both of these are convenient, consistent reductions of more complex concepts expressed as free text in the literature. Moreover, their contents are easily integrated into computational processes required for dealing with large data sets. Last, mining text directly for specific types of information is on the rise as text analytics methods become more accurate and accessible. These uses of literature, specifically manual curation, derived concepts captured in ontologies and databases, and indirect and direct application of text mining, will be discussed as they pertain to systems biology.  相似文献   

18.
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of genomics and proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years, there has been a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature and find the nuggets of information most relevant and useful for specific analysis tasks. This paper provides a road map to the various literature-mining methods, both in general and within bioinformatics. It surveys the disciplines involved in unstructured-text analysis, categorizes current work in biomedical literature mining with respect to these disciplines, and provides examples of text analysis methods applied towards meeting some of the current challenges in bioinformatics.  相似文献   

19.
A method for the global analysis of yeast intracellular metabolites, based on electrospray mass spectrometry (ES-MS), has been developed. This has involved the optimization of methods for quenching metabolism in Saccharomyces cerevisiae and extracting the metabolites for analysis by positive-ion electrospray mass spectrometry. The influence of cultivation conditions, sampling, quenching and extraction conditions, concentration step, and storage have all been studied and adapted to allow direct infusion of samples into the mass spectrometer and the acquisition of metabolic profiles with simultaneous detection of more than 25 intracellular metabolites. The method, which can be applied to other micro-organisms and biological systems, may be used for comparative analysis and screening of metabolite profiles of yeast strains and mutants under controlled conditions in order to elucidate gene function via metabolomics. Examples of the application of this analytical strategy to specific yeast strains and single-ORF yeast deletion mutants generated through the EUROFAN programme are presented.  相似文献   

20.
Toward the storage metabolome: profiling the barley vacuole   总被引:2,自引:0,他引:2  
While recent years have witnessed dramatic advances in our capacity to identify and quantify an ever-increasing number of plant metabolites, our understanding of how metabolism is spatially regulated is still far from complete. In an attempt to partially address this question, we studied the storage metabolome of the barley (Hordeum vulgare) vacuole. For this purpose, we used highly purified vacuoles isolated by silicon oil centrifugation and compared their metabolome with that found in the mesophyll protoplast from which they were derived. Using a combination of gas chromatography-mass spectrometry and Fourier transform-mass spectrometry, we were able to detect 59 (primary) metabolites for which we know the exact chemical structure and a further 200 (secondary) metabolites for which we have strong predicted chemical formulae. Taken together, these metabolites comprise amino acids, organic acids, sugars, sugar alcohols, shikimate pathway intermediates, vitamins, phenylpropanoids, and flavonoids. Of the 259 putative metabolites, some 12 were found exclusively in the vacuole and 34 were found exclusively in the protoplast, while 213 were common in both samples. When analyzed on a quantitative basis, however, there is even more variance, with more than 60 of these compounds being present above the detection limit of our protocols. The combined data were also analyzed with respect to the tonoplast proteome in an attempt to infer specificities of the transporter proteins embedded in this membrane. Following comparison with recent observations made using nonaqueous fractionation of Arabidopsis (Arabidopsis thaliana), we discuss these data in the context of current models of metabolic compartmentation in plants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号