共查询到20条相似文献,搜索用时 125 毫秒
1.
2.
Benjamin Schmid Johannes Schindelin Albert Cardona Mark Longair Martin Heisenberg 《BMC bioinformatics》2010,11(1):274
Background
Current imaging methods such as Magnetic Resonance Imaging (MRI), Confocal microscopy, Electron Microscopy (EM) or Selective Plane Illumination Microscopy (SPIM) yield three-dimensional (3D) data sets in need of appropriate computational methods for their analysis. The reconstruction, segmentation and registration are best approached from the 3D representation of the data set. 相似文献3.
Reisinger F Krishna R Ghali F Ríos D Hermjakob H Vizcaíno JA Jones AR 《Proteomics》2012,12(6):790-794
We present a Java application programming interface (API), jmzIdentML, for the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) mzIdentML standard for peptide and protein identification data. The API combines the power of Java Architecture of XML Binding (JAXB) and an XPath-based random-access indexer to allow a fast and efficient mapping of extensible markup language (XML) elements to Java objects. The internal references in the mzIdentML files are resolved in an on-demand manner, where the whole file is accessed as a random-access swap file, and only the relevant piece of XMLis selected for mapping to its corresponding Java object. The APIis highly efficient in its memory usage and can handle files of arbitrary sizes. The APIfollows the official release of the mzIdentML (version 1.1) specifications and is available in the public domain under a permissive licence at http://www.code.google.com/p/jmzidentml/. 相似文献
4.
Early detection of cancer using biomarkers obtained from blood or other easily accessible tissues would have a significant impact on reducing cancer mortality. However, identifying new blood-based biomarkers has been hindered by the dynamic complexity of the human plasma proteome, confounded by genetic and environmental variability, and the scarcity of high quality controlled samples. In this report, we discuss a new paradigm for biomarker discovery through the use of mouse models. Inbred mouse models of cancer recapitulate many critical features of human cancer, while eliminating sources of environmental and genetic variability. The ability to collect samples from highly matched cases and controls under identical conditions further reduces variability which is critical for successful biomarker discovery. We describe the establishment of a repository containing tumor, plasma, urine, and other tissues from 10 different mouse models of human cancer, including two breast, two lung, two prostate, two gastrointestinal, one ovarian, and one skin tumor model. We present the overall design of this resource and its potential use by the research community for biomarker discovery. 相似文献
5.
Background
Detection and quantification of cyclic alternating patterns (CAP) components has the potential to serve as a disease bio-marker. Few methods exist to discriminate all the different CAP components, they do not present appropriate sensitivities, and often they are evaluated based on accuracy (AC) that is not an appropriate measure for imbalanced datasets.Methods
We describe a knowledge discovery methodology in data (KDD) aiming the development of automatic CAP scoring approaches. Automatic CAP scoring was faced from two perspectives: the binary distinction between A-phases and B-phases, and also for multi-class classification of the different CAP components. The most important KDD stages are: extraction of 55 features, feature ranking/transformation, and classification. Classification is performed by (i) support vector machine (SVM), (ii) k-nearest neighbors (k-NN), and (iii) discriminant analysis. We report the weighted accuracy (WAC) that accounts for class imbalance.Results
The study includes 30 subjects from the CAP Sleep Database of Physionet. The best alternative for the discrimination of the different A-phase subtypes involved feature ranking by the minimum redundancy maximum relevance algorithm (mRMR) and classification by SVM, with a WAC of 51%. Concerning the binary discrimination between A-phases and B-phases, k-NN with mRMR ranking achieved the best WAC of 80%.Conclusions
We describe a KDD that, to the best of our knowledge, was for the first time applied to CAP scoring. In particular, the fully discrimination of the three different A-phases subtypes is a new perspective, since past works tried multi-class approaches but based on grouping of different sub-types. We also considered the weighted accuracy, in addition to simple accuracy, resulting in a more trustworthy performance assessment. Globally, better subtype sensitivities than other published approaches were achieved.6.
PedVizApi is a Java API (application program interface) for the visual analysis of large and complex pedigrees. It provides all the necessary functionality for the interactive exploration of extended genealogies. While available packages are mostly focused on a static representation or cannot be added to an existing application, PedVizApi is a highly flexible open source library for the efficient construction of visual-based applications for the analysis of family data. An extensive demo application and a R interface is provided. AVAILABILITY: http://www.pedvizapi.org 相似文献
7.
Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focuses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information. 相似文献
8.
Helsens K Brusniak MY Deutsch E Moritz RL Martens L 《Journal of proteome research》2011,10(11):5260-5263
We here present jTraML, a Java API for the Proteomics Standards Initiative TraML data standard. The library provides fully functional classes for all elements specified in the TraML XSD document, as well as convenient methods to construct controlled vocabulary-based instances required to define SRM transitions. The use of jTraML is demonstrated via a two-way conversion tool between TraML documents and vendor specific files, facilitating the adoption process of this new community standard. The library is released as open source under the permissive Apache2 license and can be downloaded from http://jtraml.googlecode.com . TraML files can also be converted online at http://iomics.ugent.be/jtraml . 相似文献
9.
Current analyses of co-expressed genes are often based on global approaches such as clustering or bi-clustering. An alternative way is to employ local methods and search for patterns--sets of genes displaying specific expression properties in a set of situations. The main bottleneck of this type of analysis is twofold--computational costs and an overwhelming number of candidate patterns which can hardly be further exploited. A timely application of background knowledge available in literature databases, biological ontologies and other sources can help to focus on the most plausible patterns only. The paper proposes, implements and tests a flexible constraint-based framework that enables the effective mining and representation of meaningful over-expression patterns representing intrinsic associations among genes and biological situations. The framework can be simultaneously applied to a wide spectrum of genomic data and we demonstrate that it allows to generate new biological hypotheses with clinical implications. 相似文献
10.
11.
Humans can categorize objects in complex natural scenes within 100-150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ~6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization.To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (~100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes. 相似文献
12.
We here present jmzML, a Java API for the Proteomics Standards Initiative mzML data standard. Based on the Java Architecture for XML Binding and XPath‐based XML indexer random‐access XML parser, jmzML can handle arbitrarily large files in minimal memory, allowing easy and efficient processing of mzML files using the Java programming language. jmzML also automatically resolves internal XML references on‐the‐fly. The library (which includes a viewer) can be downloaded from http://jmzml.googlecode.com . 相似文献
13.
油菜肥料运筹的动态知识模型 总被引:6,自引:0,他引:6
通过分析和提炼油菜施肥管理方面的最新研究资料,利用养分平衡原理,以产量目标和土壤理化特性等为基础,建立了系统化和广适性的油菜肥料运筹动态知识模型。该模型可用于精确定量不同环境条件下油菜生产过程中的总施氮量、施磷量、施钾量和施硼量、有机氮与无机氮的比例以及氮、磷、钾基肥与追肥的比例等。利用南京、仪征和如皋3个不同生态点常年气象资料、不同土壤类型和不同产量目标等对所建肥料运筹知识模型进行了实例分析,结果表明,所建知识模型总体上具有较好的决策性和适用性。 相似文献
14.
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein-protein interactions, protein/gene regulations, protein-small molecule interactions, protein-GO relationships, protein-pathway relationships, and pathway-disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses--the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. 相似文献
15.
Drug discovery is the process of new drug identification. This process is driven by the increasing data from existing chemical libraries and data banks. The knowledge graph is introduced to the domain of drug discovery for imposing an explicit structure to integrate heterogeneous biomedical data. The graph can provide structured relations among multiple entities and unstructured semantic relations associated with entities. In this review, we summarize knowledge graph-based works that implement drug repurposing and adverse drug reaction prediction for drug discovery. As knowledge representation learning is a common way to explore knowledge graphs for prediction problems, we introduce several representative embedding models to provide a comprehensive understanding of knowledge representation learning. 相似文献
16.
Rzhetsky A Koike T Kalachikov S Gomez SM Krauthammer M Kaplan SH Kra P Russo JJ Friedman C 《Bioinformatics (Oxford, England)》2000,16(12):1120-1128
MOTIVATION: In order to aid in hypothesis-driven experimental gene discovery, we are designing a computer application for the automatic retrieval of signal transduction data from electronic versions of scientific publications using natural language processing (NLP) techniques, as well as for visualizing and editing representations of regulatory systems. These systems describe both signal transduction and biochemical pathways within complex multicellular organisms, yeast, and bacteria. This computer application in turn requires the development of a domain-specific ontology, or knowledge model. RESULTS: We introduce an ontological model for the representation of biological knowledge related to regulatory networks in vertebrates. We outline a taxonomy of the concepts, define their 'whole-to-part' relationships, describe the properties of major concepts, and outline a set of the most important axioms. The ontology is partially realized in a computer system designed to aid researchers in biology and medicine in visualizing and editing a representation of a signal transduction system. 相似文献
17.
Holger Fröhlich 《Biometrical journal. Biometrische Zeitschrift》2014,56(2):287-306
Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step toward a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Most of these methods focus on classification problems, that is learn a model from data that discriminates patients into distinct clinical groups. Far less has been published on approaches that predict a patient's event risk. In this paper, we investigate eight methods that integrate network information into multivariable Cox proportional hazard models for risk prediction in breast cancer. We compare the prediction performance of our tested algorithms via cross‐validation as well as across different datasets. In addition, we highlight the stability and interpretability of obtained gene signatures. In conclusion, we find GeneRank‐based filtering to be a simple, computationally cheap and highly predictive technique to integrate network information into event time prediction models. Signatures derived via this method are highly reproducible. 相似文献
18.
19.
Guannan He Yanchun Liang Yan Chen William Yang Jun S. Liu Mary Qu Yang Renchu Guan 《BMC systems biology》2018,12(7):116
Background
Nowadays, because of the huge economic burden on society causing by obesity and diabetes, they turn into the most serious public health challenges in the world. To reveal the close and complex relationships between diabetes, obesity and other diseases, search the effective treatment for them, a novel model named as representative latent Dirichlet allocation (RLDA) topic model is presented.Results
RLDA was applied to a corpus of more than 337,000 literatures of diabetes and obesity which were published from 2007 to 2016. To unveil those meaningful relationships between diabetes mellitus, obesity and other diseases, we performed an explicit analysis on the output of our model with a series of visualization tools. Then, with the clinical reports which were not used in the training data to show the credibility of our discoveries, we find that a sufficient number of these records are matched directly. Our results illustrate that in the last 10 years, for obesity accompanying diseases, scientists and researchers mainly focus on 17 of them, such as asthma, gastric disease, heart disease and so on; for the study of diabetes mellitus, it features a more broad scope of 26 diseases, such as Alzheimer’s disease, heart disease and so forth; for both of them, there are 15 accompanying diseases, listed as following: adrenal disease, anxiety, cardiovascular disease, depression, heart disease, hepatitis, hypertension, hypothalamic disease, respiratory disease, myocardial infarction, OSAS, liver disease, lung disease, schizophrenia, tuberculosis. In addition, tumor necrosis factor, tumor, adolescent obesity or diabetes, inflammation, hypertension and cell are going be the hot topics related to diabetes mellitus and obesity in the next few years.Conclusions
With the help of RLDA, the hotspots analysis-relation discovery results on diabetes and obesity were achieved. We extracted the significant relationships between them and other diseases such as Alzheimer’s disease, heart disease and tumor. It is believed that the new proposed representation learning algorithm can help biomedical researchers better focus their attention and optimize their research direction.20.
An object model and database for functional genomics 总被引:2,自引:0,他引:2
Jones A Hunt E Wastling JM Pizarro A Stoeckert CJ 《Bioinformatics (Oxford, England)》2004,20(10):1583-1590
MOTIVATION: Large-scale functional genomics analysis is now feasible and presents significant challenges in data analysis, storage and querying. Data standards are required to enable the development of public data repositories and to improve data sharing. There is an established data format for microarrays (microarray gene expression markup language, MAGE-ML) and a draft standard for proteomics (PEDRo). We believe that all types of functional genomics experiments should be annotated in a consistent manner, and we hope to open up new ways of comparing multiple datasets used in functional genomics. RESULTS: We have created a functional genomics experiment object model (FGE-OM), developed from the microarray model, MAGE-OM and two models for proteomics, PEDRo and our own model (Gla-PSI-Glasgow Proposal for the Proteomics Standards Initiative). FGE-OM comprises three namespaces representing (i) the parts of the model common to all functional genomics experiments; (ii) microarray-specific components; and (iii) proteomics-specific components. We believe that FGE-OM should initiate discussion about the contents and structure of the next version of MAGE and the future of proteomics standards. A prototype database called RNA And Protein Abundance Database (RAPAD), based on FGE-OM, has been implemented and populated with data from microbial pathogenesis. AVAILABILITY: FGE-OM and the RAPAD schema are available from http://www.gusdb.org/fge.html, along with a set of more detailed diagrams. RAPAD can be accessed by registration at the site. 相似文献