首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Literature search is a process in which external developers provide alternative representations for efficient data mining of biomedical literature such as ranking search results, displaying summarized knowledge of semantics and clustering results into topics. In clustering search results, prominent vocabularies, such as GO (Gene Ontology), MeSH(Medical Subject Headings) and frequent terms extracted from retrieved PubMed abstracts have been used as topics for grouping. In this study, we have proposed FNeTD (Frequent Nearer Terms of the Domain) method for PubMed abstracts clustering. This is achieved through a two-step process viz; i) identifying frequent words or phrases in the abstracts through the frequent multi-word extraction algorithm and ii) identifying nearer terms of the domain from the extracted frequent phrases using the nearest neighbors search. The efficiency of the clustering of PubMed abstracts using nearer terms of the domain was measured using F-score. The present study suggests that nearer terms of the domain can be used for clustering the search results.  相似文献   

2.
The large body of knowledge about Escherichia coli makes it a useful model organism for the expression of heterologous proteins. Proteomic studies have helped to elucidate the complex cellular responses of E. coli and facilitated its use in a variety of biotechnology applications. Knowledge of basic cellular processes provides the means for better control of heterologous protein expression. Beyond such important applications, E. coli is an ideal organism for testing new analytical technologies because of the extensive knowledge base available about the organism. For example, improved technology for characterization of unknown proteins using mass spectrometry has made two-dimensional electrophoresis (2DE) studies more useful and more rewarding, and much of the initial testing of novel protocols is based on well-studied samples derived from E. coli. These techniques have facilitated the construction of more accurate 2DE maps. In this review, we present work that led to the 2DE databases, including a new map based on tandem time-of-flight (TOF) mass spectrometry (MS); describe cellular responses relevant to biotechnology applications; and discuss some emerging proteomic techniques.  相似文献   

3.
The generation of proteomic data is becoming ever more high throughput. Both the technologies and experimental designs used to generate and analyze data are becoming increasingly complex. The need for methods by which such data can be accurately described, stored and exchanged between experimenters and data repositories has been recognized. Work by the Proteome Standards Initiative of the Human Proteome Organization has laid the foundation for the development of standards by which experimental design can be described and data exchange facilitated. The Minimum Information About a Proteomic Experiment data model describes both the scope and purpose of a proteomics experiment and encompasses the development of more specific interchange formats such as the mzData model of mass spectrometry. The eXtensible Mark-up Language-MI data interchange format, which allows exchange of molecular interaction data, has already been published and major databases within this field are supplying data downloads in this format.  相似文献   

4.
Contemporary proteomics, currently in its exponential growth phase, is a bewildering array of tools. Proteomic methods are the result of a convergence of rapidly improving mass spectrometry technologies, protein chemistry and separation sciences, genomics and bioinformatics. Strides in improving proteomics technologies to map and measure proteomes and subproteomes are being made. However, no single proteomic platform appears ideally suited to address all research needs or accomplish ambitious goals satisfactorily. However, proteomics is in a unique position to contribute to protein discovery and to public health in terms of better biomarkers, diagnostics and treatment of disease. While the potential is great, many challenges and issues remain to be solved. Fundamental issues, such as biological variability, pre-analytic factors and analytical reproducibility, remain to be resolved. Neither an all-genetic approach nor an all-proteomic approach will solve biological complexity. Proteomics will be the foundation for constructing and extracting useful knowledge to pharma and biotech depicted in the following path: data --> structured data --> information --> information architecture --> knowledge --> useful knowledge.  相似文献   

5.
The generation of proteomic data is becoming ever more high throughput. Both the technologies and experimental designs used to generate and analyze data are becoming increasingly complex. The need for methods by which such data can be accurately described, stored and exchanged between experimenters and data repositories has been recognized. Work by the Proteome Standards Initiative of the Human Proteome Organization has laid the foundation for the development of standards by which experimental design can be described and data exchange facilitated. The Minimum Information About a Proteomic Experiment data model describes both the scope and purpose of a proteomics experiment and encompasses the development of more specific interchange formats such as the mzData model of mass spectrometry. The eXtensible Mark-up Language-MI data interchange format, which allows exchange of molecular interaction data, has already been published and major databases within this field are supplying data downloads in this format.  相似文献   

6.
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarized under the term Research Data Repositories (RDR). The project re3data.org–Registry of Research Data Repositories–has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape. In July 2013 re3data.org lists 400 research data repositories and counting. 288 of these are described in detail using the re3data.org vocabulary. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data. This article describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further the article outlines the features of re3data.org, and shows how this registry helps to identify appropriate repositories for storage and search of research data.  相似文献   

7.
The vast amount of data produced by today’s medical imaging systems has led medical professionals to turn to novel technologies in order to efficiently handle their data and exploit the rich information present in them. In this context, artificial intelligence (AI) is emerging as one of the most prominent solutions, promising to revolutionise every day clinical practice and medical research. The pillar supporting the development of reliable and robust AI algorithms is the appropriate preparation of the medical images to be used by the AI-driven solutions. Here, we provide a comprehensive guide for the necessary steps to prepare medical images prior to developing or applying AI algorithms. The main steps involved in a typical medical image preparation pipeline include: (i) image acquisition at clinical sites, (ii) image de-identification to remove personal information and protect patient privacy, (iii) data curation to control for image and associated information quality, (iv) image storage, and (v) image annotation. There exists a plethora of open access tools to perform each of the aforementioned tasks and are hereby reviewed. Furthermore, we detail medical image repositories covering different organs and diseases. Such repositories are constantly increasing and enriched with the advent of big data. Lastly, we offer directions for future work in this rapidly evolving field.  相似文献   

8.
Proteomic analysis is not limited to the analysis of serum or tissues. Synovial, peritoneal, pericardial and cerebrospinal fluid represent unique proteomes for disease diagnosis and prognosis. In particular, cerebrospinal fluid serves as a rich source of putative biomarkers that are not solely limited to neurologic disorders. Peptides, proteolytic fragments and antibodies are capable of crossing the blood-brain barrier, thus providing a repository of pathologic information. Proteomic technologies such as immunoblotting, isoelectric focusing, 2D gel electrophoresis and mass spectrometry have proven useful for deciphering this unique proteome. Cerebrospinal fluid proteins are generally less abundant than their corresponding serum counterparts, necessitating the development and use of sensitive analytical techniques. This review highlights some of the promising areas of cerebrospinal fluid proteomic research and their clinical applications.  相似文献   

9.
The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual ‘tokens’ from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.  相似文献   

10.
Mass spectrometry (MS) coupled to affinity purification is a powerful approach for identifying protein-protein interactions and for mapping post-translational modifications. Prior to MS analysis, affinity-purified proteins are typically separated by gel electrophoresis, visualized with a protein stain, excised, and subjected to in-gel digestion. An inherent limitation of this series of steps is the loss of protein sample that occurs during gel processing. Although methods employing in-solution digestion have been reported, they generally suffer from poor reaction kinetics. In the present study, we demonstrate an application of a microfluidic processing device, termed the Proteomic Reactor, for enzymatic digestion of affinity-purified proteins for liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis. Use of the Proteomic Reactor enabled the identification of numerous ubiquitinated proteins in a human cell line expressing reduced amounts of the ubiquitin-dependent chaperone, valosin-containing protein (VCP). The Proteomic Reactor is a novel technology that facilitates the analysis of affinity-purified proteins and has the potential to aid future biological studies.  相似文献   

11.
R Lawson 《BioTechniques》1990,8(6):680-683
PaperChase is a computer program which provides an efficient interface to the National Library of Medicine's MEDLINE database of references to the biomedical literature. The database includes references (citations) and abstracts compiled from Index Medicus, the International Nursing Index and the Index to Dental Literature. PaperChase may be accessed using any computer terminal or personal computer with modem. No special knowledge of computers or biomedical terms is necessary. Simple menus enable the novice to search the biomedical literature without training. A command language speeds searching for the experienced user. PaperChase does not require the user to know the database's indexing terminology, called Medical Subject Headings. Everyday language may be used and PaperChase will translate, or "map", the user's search term into the required Medical Subject Heading. PaperChase monitors a search in progress and suggests additional Medical Subject Headings which can be used to broaden or narrow a search. The searcher can order a full-text photocopy of any reference found in PaperChase. Support documentation and a subscriber newsletter are provided at no charge. Trained search specialists are available to offer assistance and to answer questions.  相似文献   

12.
Proteomics, analogous with genomics, is the analysis of the protein complement present in a cell, organ, or organism at any given time. While the genome provides information about the theoretical status of the cellular proteins, the proteome describes the actual content, which ultimately determines the phenotype. The broad application of proteomic technologies in basic science and clinical medicine has the potential to accelerate our understanding of the molecular mechanisms underlying disease and may facilitate the discovery of new drug targets and diagnostic disease markers. Proteomics is a rapidly developing and changing scientific discipline, and the last 5 yr have seen major advances in the underlying techniques as well as expansion into new applications. Core technologies for the separation of proteins and/or peptides are one- and two-dimensional gel electrophoresis and one- and two-dimensional liquid chromatography, and these are coupled almost exclusively with mass spectrometry. Proteomic studies have shown that the most effective analysis of even simple biological samples requires subfractionation and/or enrichment before protein identification by mass spectrometry. Selection of the appropriate technology or combination of technologies to match the biological questions is essential for maximum coverage of the selected subproteome and to ensure both the full interpretation and the downstream utility of the data. In this review, we describe the current technologies for proteome fractionation and separation of biological samples, based on our lab workflow for biomarker discovery and validation.  相似文献   

13.
Progress in MS‐based methods for veterinary research and diagnostics is lagging behind compared to the human research, and proteome data of domestic animals is still not well represented in open source data repositories. This is particularly true for the equine species. Here we present a first Equine PeptideAtlas encompassing high‐resolution tandem MS analyses of 51 samples representing a selection of equine tissues and body fluids from healthy and diseased animals. The raw data were processed through the Trans‐Proteomic Pipeline to yield high quality identification of proteins and peptides. The current release comprises 24 131 distinct peptides representing 2636 canonical proteins observed at false discovery rates of 0.2% at the peptide level and 1.4% at the protein level. Data from the Equine PeptideAtlas are available for experimental planning, validation of new datasets, and as a proteomic data mining resource. The advantages of the Equine PeptideAtlas are demonstrated by examples of mining the contents for information on potential and well‐known equine acute phase proteins, which have extensive general interest in the veterinary clinic. The extracted information will support further analyses, and emphasizes the value of the Equine PeptideAtlas as a resource for the design of targeted quantitative proteomic studies.  相似文献   

14.
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.  相似文献   

15.
Plant seed storage proteins were among the first proteins to be isolated (20); however, only recently, as a result of using molecular biology techniques, have the amino acid sequences of many of these proteins been determined. With the accumulation of amino acid sequence data for many vicilin-type storage proteins much has been learned concerning the location of conserved amino acid regions and other regions which can tolerate amino acid sequence variation. Combining this knowledge with recent advances in plant gene transfer technologies will allow molecular biologists to correct (by using amino acid replacement mutations) the sulfur amino acid deficiency inherent to bean seed storage proteins. The development of more nutritious soybean and common bean seeds will be of benefit to programs involving human and animal nutrition.  相似文献   

16.
Conditional control of plant cell function and development relies on appropriate signal perception, signal integration and processing. The development of high throughput technologies such as proteomics and interactomics has enabled the identification of protein interaction networks that mediate signal processing from inputs to appropriate outputs. Such networks can be depicted in graphical representations using nodes and edges allowing for the immediate visualization and analysis of the network's topology. Hubs are network elements characterized by many edges (often degree grade k ≥ 5) which confer a degree of topological importance to them. The review introduces the concept of networks, hubs and bottlenecks and describes four examples from plant science in more detail, namely hubs in the redox regulatory network of the chloroplast with ferredoxin, thioredoxin and peroxiredoxin, in mitogen activated protein (MAP) kinase signal processing, in photomorphogenesis with the COP9 signalosome, COP1 and CDD, and monomeric GTPase function. Some guidance is provided to appropriate internet resources, web repositories, databases and their use. Plant networks can be generated from existing public databases and this type of analysis is valuable in support of existing hypotheses, or to allow for the generation of new concepts or ideas. However, intensive manual curating of in silico networks is still always necessary.  相似文献   

17.
The Web has become the major medium for various communities to share their knowledge. To this end, it provides an optimal environment for knowledge networks. The web offers global connectivity that is virtually instantaneous, and whose resources and documents can easily be indexed for easy searching. In the coupled realms of biomedical research and healthcare, this has become especially important where today many thousands of communities already exist that connect across academia, hospitals and industry. These communities also rely on several forms of knowledge assets, including publications, experimental data, domain-specific vocabularies and policies. Web-based communities will be one of the earlier beneficiaries of the emerging Semantic Web. With the new standards and technologies of the Semantic Web, effective utilization of knowledge networks will expand profoundly, fostering new levels of innovation and knowledge.  相似文献   

18.
In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism‐specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease‐associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide‐level identifications in the main MS‐based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism‐specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS‐based bottom‐up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.  相似文献   

19.
20.
Recent and historical data suggest that the interaction of antigenic materials, including food proteins, with the mucosal immune system is an important component of certain diseases, causative either of an important manifestation of or the disease itself. The adequacy of existing knowledge concerning digestion and absorption of dietary proteins, disposition of absorbed antigens, and potential adverse effects to meet requirements of a safety evaluation is addressed. Currently, the immunological consequences of introducing new food proteins (e.g., leaf and bean protein concentrates), new processing technologies (food irradiation, chemical sterilization), and changes in traditional foods through emerging technologies (genetic engineering) can be neither predicted nor routinely measured.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号