首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein Glycosylation is an important post translational event that plays a pivotal role in protein folding and protein is trafficking. We describe a dictionary based and a rule based approach to mine ‘mentions‘ of protein glycosylation in text. The dictionary based approach relies on a set of manually curated dictionaries specially constructed to address this task. Abstracts are then screened for the ‘mentions‘ of words from these dictionaries which are further scored followed by classification on the basis of a threshold. The rule based approaches also relies on the words in the dictionary to arrive at the features which are used for classification. The performance of the system using both the approaches has been evaluated using a manually curated corpus of 3133 abstracts. The evaluation suggests that the performance of the Rule based approach supersedes that of the Dictionary based approach.  相似文献   

2.
Frontiers of biomedical text mining: current progress   总被引:3,自引:0,他引:3  
It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or 'BioNLP' in general, focusing primarily on papers published within the past year.  相似文献   

3.

Background  

While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined.  相似文献   

4.
Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.  相似文献   

5.
MOTIVATION: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. METHODS: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. RESULTS AND CONCLUSION: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.  相似文献   

6.
TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.  相似文献   

7.
Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry, or surface-enhanced laser desorption/ionization ProteinChip technology, has been widely used in obtaining the quantitative profiles of tissue proteomes, particularly plasma proteomes. Its high-throughput nature and simplicity in its experimental procedures have allowed this technology to become a popular research tool for biomarker discovery in the past 5 years. After accumulating more research experiences, researchers now have a better understanding of the characteristics and limitations of this technology, as well as the pitfalls in biomarker research, by undertaking a comparative proteomic approach. This review provides an overview of the surface-enhanced laser desorption/ionization time-of-flight mass spectrometry, discusses its limitations and provides some possible solutions to help apply this technology to biomarker research.  相似文献   

8.
Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry, or surface-enhanced laser desorption/ionization ProteinChip® technology, has been widely used in obtaining the quantitative profiles of tissue proteomes, particularly plasma proteomes. Its high-throughput nature and simplicity in its experimental procedures have allowed this technology to become a popular research tool for biomarker discovery in the past 5 years. After accumulating more research experiences, researchers now have a better understanding of the characteristics and limitations of this technology, as well as the pitfalls in biomarker research, by undertaking a comparative proteomic approach. This review provides an overview of the surface-enhanced laser desorption/ionization time-of-flight mass spectrometry, discusses its limitations and provides some possible solutions to help apply this technology to biomarker research.  相似文献   

9.
The stability of population oscillations in ecological systems is considered. Experiments suggest that in many cases the single patch dynamics of predator-prey or host-parasite systems is extinction prone, and stability is achieved only when the spatial structure of the population is expressed via desynchronization between patches. A few mechanisms have been suggested so far to explain the inability of dispersal to synchronize the system. Here we compare a recently discovered mechanism, based on the dependence of the angular velocity on the oscillation amplitude, with other, already known conditions for desynchronization. Using a toy model composed of diffusively coupled oscillators we suggest a classification scheme for stability mechanisms, a scheme that allows for either a priori (based on the system parameters) or a posteriori (based on local measurements) identification of the dominant process that yields desynchronization.  相似文献   

10.
BackgroundCure models can provide improved possibilities for inference if used appropriately, but there is potential for misleading results if care is not taken. In this study, we compared five commonly used approaches for modelling cure in a relative survival framework and provide some practical advice on the use of these approaches.Patients and methodsData for colon, female breast, and ovarian cancers were used to illustrate these approaches. The proportion cured was estimated for each of these three cancers within each of three age groups. We then graphically assessed the assumption of cure and the model fit, by comparing the predicted relative survival from the cure models to empirical life table estimates.ResultsWhere both cure and distributional assumptions are appropriate (e.g., for colon or ovarian cancer patients aged <75 years), all five approaches led to similar estimates of the proportion cured. The estimates varied slightly when cure was a reasonable assumption but the distributional assumption was not (e.g., for colon cancer patients ≥75 years). Greater variability in the estimates was observed when the cure assumption was not supported by the data (breast cancer).ConclusionsIf the data suggest cure is not a reasonable assumption then we advise against fitting cure models. In the scenarios where cure was reasonable, we found that flexible parametric cure models performed at least as well, or better, than the other modelling approaches. We recommend that, regardless of the model used, the underlying assumptions for cure and model fit should always be graphically assessed.  相似文献   

11.
One hundred and seven sample plots were established on a study area at Jabiluka, Northern Territory, and detailed quantitative floristic and structural data were collected. Data collection was by sampling both on aerial photographs and in the field, and both sets of data were used to describe the primary floristic types and structural sub-types. Cluster analysis (Orloci 1967, 1969), polar ordination (Mathews 1977) and Specht's (1970, 1977) approach to vegetation classification were used to analyse the data. Two independent clustering techniques, one based on art information measure and the other on a measure of within group dispersion, produced very similar dendrograms. The analyses consistently separated the plots into three major groups - floodplain, dryland and sandstone landscapes; within these groups 15 floristic associations and eight structural formations were identified. The environmental parameters associated with the various groups were substrate type, and seasonal inundation from the Magela Creek system. The results of ordination did not highlight any environmental parameters not already made evident by cluster analysis.  相似文献   

12.

Background  

Fluorescence microscopy is widely used to determine the subcellular location of proteins. Efforts to determine location on a proteome-wide basis create a need for automated methods to analyze the resulting images. Over the past ten years, the feasibility of using machine learning methods to recognize all major subcellular location patterns has been convincingly demonstrated, using diverse feature sets and classifiers. On a well-studied data set of 2D HeLa single-cell images, the best performance to date, 91.5%, was obtained by including a set of multiresolution features. This demonstrates the value of multiresolution approaches to this important problem.  相似文献   

13.
MOTIVATION: The sheer volume of textually described biomedical knowledge exerts the need for natural language processing (NLP) applications in order to allow flexible and efficient access to relevant information. Specialized semantic networks (such as biomedical ontologies, terminologies or semantic lexicons) can significantly enhance these applications by supplying the necessary terminological information in a machine-readable form. With the explosive growth of bio-literature, new terms (representing newly identified concepts or variations of the existing terms) may not be explicitly described within the network and hence cannot be fully exploited by NLP applications. Linguistic and statistical clues can be used to extract many new terms from free text. The extracted terms still need to be correctly positioned relative to other terms in the network. Classification as a means of semantic typing represents the first step in updating a semantic network with new terms. RESULTS: The MaSTerClass system implements the case-based reasoning methodology for the classification of biomedical terms.  相似文献   

14.
SUMMARY: PDQ Wizard automates the process of interrogating biomedical references using large lists of genes, proteins or free text. Using the principle of linkage through co-citation biologists can mine PubMed with these proteins or genes to identify relationships within a biological field of interest. In addition, PDQ Wizard provides novel features to define more specific relationships, highlight key publications describing those activities and relationships, and enhance protein queries. PDQ Wizard also outputs a metric that can be used for prioritization of genes and proteins for further research. AVAILABILITY: PDQ Wizard is freely available from http://www.gti.ed.ac.uk/pdqwizard/.  相似文献   

15.

Background  

Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach.  相似文献   

16.
17.
Covalent bond formation to proteins is made difficult by their multiple unprotected functional groups and normally low concentrations. A water-soluble sulfonated bathophenanthroline ligand (2) was used to promote a highly efficient Cu(I)-mediated azide-alkyne cycloaddition (CuAAC) reaction for the chemoselective attachment of biologically relevant molecules to cowpea mosaic virus (CPMV). The ligated substrates included complex sugars, peptides, poly(ethylene oxide) polymers, and the iron carrier protein transferrin, with routine success even for cases that were previously resistant to azide-alkyne coupling using the conventional ligand tris(triazolyl)amine (1). The use of 4-6 equiv of substrate was sufficient to achieve loadings of 60-115 molecules/virion in yields of 60-85%. Although it is sensitive to oxygen, the reliably efficient performance of the Cu.2 system makes it a useful tool for demanding bioconjugation applications.  相似文献   

18.
Guyuron B  Uzzo CD  Scull H 《Plastic and reconstructive surgery》1999,104(7):2202-9; discussion 2210-2
The conventional designation of septal pathology is a deviated septum, and the common treatment of choice is submucous resection of the septum. These limited generic terms leave the surgery open to frequent failure and render the education of this topic suboptimal. During 1224 septal surgeries, we have observed six different categories of septal deviation requiring different surgical treatments. A study was conducted to investigate the frequency of different classes of septal deviation and to develop guidelines for a more successful surgical correction of each category. Ninety-three consecutive patients who underwent septoplasty were carefully evaluated for the type of septal deformity, age, gender, history of trauma, and previous septal surgery. The surgical technique was reviewed for each category of the septal deformity. Of the 93 patients, 71 were women and 22 were men. Ages ranged from 13 to 76, with an average age of 31.5. Most patients exhibited a "septal tilt" deformity (40 percent; 37 of 93) or a C-shape anteroposterior deviation (32 percent; 30 of 93). The other deformities were C-shape cephalocaudal (4 percent; 4 of 93), S-shape anteroposterior (9 percent; 8 of 93), S-shape cephalocaudal (1 percent; 1 of 93), or localized deviations or large spurs (14 percent; 13 of 93). Each of the six categories of septal deviation requires specific management. If a single procedure is selected for all of the septal deformities, disappointing results may ensue.  相似文献   

19.
20.
Central to Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas systems are repeated RNA sequences that serve as Cas-protein–binding templates. Classification is based on the architectural composition of associated Cas proteins, considering repeat evolution is essential to complete the picture. We compiled the largest data set of CRISPRs to date, performed comprehensive, independent clustering analyses and identified a novel set of 40 conserved sequence families and 33 potential structure motifs for Cas-endoribonucleases with some distinct conservation patterns. Evolutionary relationships are presented as a hierarchical map of sequence and structure similarities for both a quick and detailed insight into the diversity of CRISPR-Cas systems. In a comparison with Cas-subtypes, I-C, I-E, I-F and type II were strongly coupled and the remaining type I and type III subtypes were loosely coupled to repeat and Cas1 evolution, respectively. Subtypes with a strong link to CRISPR evolution were almost exclusive to bacteria; nevertheless, we identified rare examples of potential horizontal transfer of I-C and I-E systems into archaeal organisms. Our easy-to-use web server provides an automated assignment of newly sequenced CRISPRs to our classification system and enables more informed choices on future hypotheses in CRISPR-Cas research: http://rna.informatik.uni-freiburg.de/CRISPRmap.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号