首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Literature search is a process in which external developers provide alternative representations for efficient data mining of biomedical literature such as ranking search results, displaying summarized knowledge of semantics and clustering results into topics. In clustering search results, prominent vocabularies, such as GO (Gene Ontology), MeSH(Medical Subject Headings) and frequent terms extracted from retrieved PubMed abstracts have been used as topics for grouping. In this study, we have proposed FNeTD (Frequent Nearer Terms of the Domain) method for PubMed abstracts clustering. This is achieved through a two-step process viz; i) identifying frequent words or phrases in the abstracts through the frequent multi-word extraction algorithm and ii) identifying nearer terms of the domain from the extracted frequent phrases using the nearest neighbors search. The efficiency of the clustering of PubMed abstracts using nearer terms of the domain was measured using F-score. The present study suggests that nearer terms of the domain can be used for clustering the search results.  相似文献   

2.
Although plastic surgeons have empirically "known" of the benefits of reduction mammaplasty for their patients, a paucity of outcome studies have been reported. For this study, an attempt to perform a meta-analysis of outcomes in reduction mammaplasty was undertaken. A computer literature search was performed of the MEDLINE database for the period between 1966 and September of 1997 for the Medical Subject Headings mammaplasty and outcome measures. Reference lists were used for additional reports. No trials were identified that met the criteria for meta-analysis. Seventeen publications met less restrictive review criteria that evaluated quality-of-life outcome measures. A systematic evaluation of patient-focused outcome measures demonstrated that consistent improvement in physical symptoms was found across most studies, as was a high degree of patient satisfaction (78 to 95 percent very or moderately satisfied), and some have shown improvement in body image and psychological well-being. However, although this review does identify consistent improvements in patient quality of life after reduction mammaplasty, inconsistencies among study designs do not allow formal meta-analysis.  相似文献   

3.
Use of keyword hierarchies to interpret gene expression patterns   总被引:5,自引:0,他引:5  
MOTIVATION: High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. RESULTS: We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.  相似文献   

4.
5.
MOTIVATION: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. METHODS: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. RESULTS AND CONCLUSION: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.  相似文献   

6.
The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual ‘tokens’ from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.  相似文献   

7.
In pursuit of a better updated source including 'omics' information for breast cancer, Breast Cancer Database (BCDB) has been developed to provide the researcher with the quick overview of the Breast cancer disease and other relevant information. This database comprises of myriad of information about genes involved in breast cancer, its functions and drug molecules which are currently being used in the treatment of breast cancer. The data available in BCDB is retrieved from the biomedical research literature. It facilitates the user to search information on gene, its location in chromosome, functions and its importance in cancer diseases. Broadly, this can be queried by giving gene name, protein name and drug name. This database is platform independent, user friendly and freely accessible through internet. The data present in BCDB is directly linked to other on-line resources such as NCBI, PDB and PubMed. Hence, it can act as a complete web resource comprising gene sequences, drug structures and literature information related to breast cancer, which is not available in any other breast cancer database. AVAILABILITY: The database is freely available at http://122.165.25.137/bioinfo/breastcancerdb/  相似文献   

8.
癌症的发生发展与机体内基因的改变有密切联系,在临床上表现为症状或检测指标的异常.通过挖掘分析临床表现与基因改变之间的关系,可为癌症早期诊断和精准治疗提供临床决策支持.从文献数据出发,利用结论性数据挖掘基因与临床表现的关系具有重要意义.本文提出一种基于医学主题词(Medical Subject Headings,MeSH)的生物医学实体关系挖掘方法.该方法利用PubMed中提供的文献信息,借用向量空间模型思想,使用MeSH主题词矢量表达待研究实体,引入文献相互引用因素对结果进行修正,将关系挖掘转化为矢量间的数学运算,实现定量分析.本文将该方法应用于结直肠癌临床表现和基因关系的研究中,得到与结直肠癌相关的203个基因和对应的临床-基因462个关系.通过结合使用基因功能和通路分析工具g:Profiler和KEGG等,对结果进行分析验证.结果表明,基于MeSH主题词的文献挖掘方法,避免传统“共现”方法对发现潜在关系的限制和复杂语义分析带来的大量计算,为生物实体之间潜在关系的挖掘提供一种新的思路和方法.  相似文献   

9.
RNAMotif, an RNA secondary structure definition and search algorithm   总被引:26,自引:7,他引:19       下载免费PDF全文
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures are assembled from a collection of RNA structural motifs. These basic building blocks are used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Identification of recurring RNA structural motifs will therefore enhance our understanding of RNA structure and help associate elements of RNA structure with functional and regulatory elements. Our goal was to develop a computer program that can describe an RNA structural element of any complexity and then search any nucleotide sequence database, including the complete prokaryotic and eukaryotic genomes, for these structural elements. Here we describe in detail a new computational motif search algorithm, RNAMotif, and demonstrate its utility with some motif search examples. RNAMotif differs from other motif search tools in two important aspects: first, the structure definition language is more flexible and can specify any type of base–base interaction; second, RNAMotif provides a user controlled scoring section that can be used to add capabilities that patterns alone cannot provide.  相似文献   

10.
A computer program that facilitates the creation of a culture collection database has been written for a microcomputer (Apple He with a Z-80 card) using dBASE II® (Ashton-Tate). The Culture Collection Program accommodates up to 250 individual strain records on one 5 1/4" floppy disk. For each strain, information that can be stored includes the name of the micro-organism, culture collection number, antibiotic resistance markers, plasmids, genetic markers, references, growth medium, growth temperature and additional comments. The last date of subculturing can be ascertained and information about the status of the preserved cultures can also be noted. With a menu-driven format which requires no computer programming expertise, the user can readily create new entries, update old ones and search the database for strains with certain common properties.  相似文献   

11.
Currently, literature is integrated in systems biology studies in three ways. Hand-curated pathways have been sufficient for assembling models in numerous studies. Second, literature is frequently accessed in a derived form, such as the concepts represented by the Medical Subject Headings (MeSH) and Gene Ontologies (GO), or functional relationships captured in protein-protein interaction (PPI) databases; both of these are convenient, consistent reductions of more complex concepts expressed as free text in the literature. Moreover, their contents are easily integrated into computational processes required for dealing with large data sets. Last, mining text directly for specific types of information is on the rise as text analytics methods become more accurate and accessible. These uses of literature, specifically manual curation, derived concepts captured in ontologies and databases, and indirect and direct application of text mining, will be discussed as they pertain to systems biology.  相似文献   

12.
B Billoud  M Kontic    A Viari 《Nucleic acids research》1996,24(8):1395-1403
At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified.  相似文献   

13.
We have developed PubNet, a web-based tool that extracts several types of relationships returned by PubMed queries and maps them into networks, allowing for graphical visualization, textual navigation, and topological analysis. PubNet supports the creation of complex networks derived from the contents of individual citations, such as genes, proteins, Protein Data Bank (PDB) IDs, Medical Subject Headings (MeSH) terms, and authors. This feature allows one to, for example, examine a literature derived network of genes based on functional similarity.  相似文献   

14.
SUMMARY: MeSHer uses a simple statistical approach to identify biological concepts in the form of Medical Subject Headings (MeSH terms) obtained from the PubMed database that are significantly overrepresented within the identified gene set relative to those associated with the overall collection of genes on the underlying DNA microarray platform. As a demonstration, we apply this approach to gene lists acquired from a published study of the effects of angiotensin II (Ang II) treatment on cardiac gene expression and demonstrate that this approach can aid in the interpretation of the resulting 'significant' gene set. AVAILABILITY: The software is available at http://www.tm4.org. SUPPLEMENTARY INFORMATION: Results from the analysis of significant genes from the published Ang II study.  相似文献   

15.
16.
Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.  相似文献   

17.
MiSearch is an adaptive biomedical literature search tool that ranks citations based on a statistical model for the likelihood that a user will choose to view them. Citation selections are automatically acquired during browsing and used to dynamically update a likelihood model that includes authorship, journal and PubMed indexing information. The user can optionally elect to include or exclude specific features and vary the importance of timeliness in the ranking. AVAILABILITY: http://misearch.ncibi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

18.
H B Jenson 《BioTechniques》1989,7(6):590-592
A novel computer database program dedicated to storing, cataloging, and accessing information about recombinant clones and libraries has been developed for the IBM (or compatible) personal computer. This program, named CLONES, also stores information about bacterial strains and plasmid and bacteriophage vectors used in molecular biology. The advantages of this method are improved organization of data, fast and easy assimilation of new data, automatic association of new data with existing data, and rapid retrieval of desired records using search criteria specified by the user. Individual records are indexed in the database using B-trees, which automatically index new entries and expedite later access. The use of multiple windows, pull-down menus, scrolling pick-lists, and field-input techniques make the program intuitive to understand and easy to use. Daughter databases can be created to include all records of a particular type, or only those records matching user-specified search criteria. Separate databases can also be merged into a larger database. This computer program provides an easy-to-use and accurate means to organize, maintain, access, and share information about recombinant clones and other laboratory products of molecular biology technology.  相似文献   

19.
Bioinformatics has emerged as an integral part of life sciences and biomedical research. The bioinformatics tools developed so far exist individually and do not cross talk leading biologists to spend more time in formatting the output from one tool as input for another tool. This leads to huge loss of time and cost. We herein have made platform which integrates the tools in a way that the output of one program can be directly used as input of another and does not need any modifications. Tools for similarity search, primer designing, and restriction enzyme digestion are required in almost all biological research; therefore we initially tried to integrate these tools. BioParisodhana platform optimizes the time spend in browsing and downloading applications and is an interactive, effective and user friendly. AVAILABILITY: The database is available for free at http://resource.ibab.ac.in/bioparishodhana.html.  相似文献   

20.
MOTIVATION: Blast programs are very efficient in finding relatively strong similarities but some very distantly related sequences are given a very high Expect value and are ranked very low in Blast results. We have developed Ballast, a program to predict local maximum segments (LMSs-i.e. sequence segments conserved relatively to their flanking regions) from a single Blast database search and to highlight these divergent homologues. The TBlastN database searches can also be processed with the help of information from a joint BlastP search. RESULTS: We have applied the Ballast algorithm to BlastP searches performed with sequences belonging to well described dispersed families (aminoacyl-tRNA synthetases; helicases) against the SwissProt 38 database. We show that Ballast is able to build an appropriate conservation profile and that LMSs are predicted that are consistent with the signatures and motifs described in the literature. Furthermore, by comparing the Blast, PsiBlast and Ballast results obtained on a well defined database of structurally related sequences, we show that the LMSs provide a scoring scheme that can concentrate on top ranking distant homologues better than Blast. Using the graphical user interface available on the Web, specific LMSs may be selected to detect divergent homologues sharing the corresponding properties with the query sequence without requiring any additional database search.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号