共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. 相似文献2.
A probabilistic generative model for GO enrichment analysis 总被引:1,自引:0,他引:1
The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods. 相似文献
3.
Background
For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates. 相似文献4.
We develop a probabilistic approach to optimum reserve design based on the species-area relationship. Specifically, we focus on the distribution of areas among a set of reserves maximizing biodiversity. We begin by presenting analytic solutions for the neutral case in which all species have the same colonization probability. The optimum size distribution is determined by the local-to-regional species richness ratio k. There is a critical k(t) ratio defined by the number of reserves raised to the scaling exponent of the species-area relationship. Below k(t), a uniform area distribution across reserves maximizes biodiversity. Beyond k(t), biodiversity is maximized by allocating a certain area to one reserve and uniformly allocating the remaining area to the other reserves. We proceed by numerically exploring the robustness of our analytic results when departing from the neutral assumption of identical colonization probabilities across species. 相似文献
5.
The main aim of this paper is to present a simple probabilistic model for the early stage of neuron growth: the specification on an axon out of several initially similar neurites. The model is a Markov process with competition between the growing neurites, wherein longer objects have more chances to grow, and parameter alpha determines the intensity of the competition. For alpha > 1 the model provides results which are qualitatively similar to the experimental ones, i.e. selection of one rapidly elongating axon out of several neurites while other less successful neurites stop growing at some random time. Rigorous mathematical proofs are given. 相似文献
6.
In order to mitigate the problem of increasing model complexity with increasing number of occupation states in spatially implicit metacommunity models, the assumption of independency among species distributions is often required. In the present paper, we show that this approach only works correctly if set relations among patch occupancy states are considered adequately. This is illustrated by means of a well-known, although incorrectly formulated, predator-prey metacommunity model devised by Bascompte and Solé [1]. We demonstrate that this model shows anomalous dynamical behavior caused by inconsistence between the model formulation and its assumptions. In order to formalize our finding we develop a corrected model formulation that accounts for the principles of set theory so that the sum of the system compartments change rate is nulled. Applying this method successfully rules out the occurrence of anomalous dynamical behavior found in the original model. Finally we discuss the implications of our findings for the accuracy of model predictions. 相似文献
7.
A Fungean solid is derived for membranous materials as a body defined by isotropic response functions whose mathematical structure is that of a Hookean solid where the elastic constants are replaced by functions of state derived from an implicit, thermodynamic, internal energy function. The theory utilizes Biot’s (Lond Edinb Dublin Philos Mag J Sci 27:468–489, 1939) definitions for stress and strain that, in one-dimension, are the stress/strain measures adopted by Fung (Am J Physiol 28:1532–1544, 1967) when he postulated what is now known as Fung’s law. Our Fungean membrane model is parameterized against a biaxial data set acquired from a porcine pleural membrane subjected to three, sequential, proportional, planar extensions. These data support an isotropic/deviatoric split in the stress and strain-rate hypothesized by our theory. These data also demonstrate that the material response is highly nonlinear but, otherwise, mechanically isotropic. These data are described reasonably well by our otherwise simple, four-parameter, material model. 相似文献
8.
A mathematical model for structure-function relations in hemoglobin 总被引:17,自引:0,他引:17
9.
Humans can categorize objects in complex natural scenes within 100-150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ~6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization.To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (~100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes. 相似文献
10.
MOTIVATION: Affymetrix GeneChip arrays are currently the most widely used microarray technology. Many summarization methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large datasets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and does not account for specific binding to the mismatch (MM) probes. RESULTS: We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark datasets and a real time-course dataset, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions. AVAILABILITY: Both mgMOS and the new model multi-mgMOS have been implemented in an R package, which is available at http://www.bioinf.man.ac.uk/resources/puma. 相似文献
11.
12.
Quoc-Chinh Bui Breanndán Ó Nualláin Charles A Boucher Peter MA Sloot 《BMC bioinformatics》2010,11(1):101
Background
In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature. 相似文献13.
14.
Human observers can perceive the three- dimensional (3-D) structure of their environment using various cues, an important
one of which is optic flow. The motion of any point’s projection on the retina depends both on the point’s movement in space
and on its distance from the eye. Therefore, retinal motion can be used to extract the 3-D structure of the environment and
the shape of objects, in a process known as structure-from-motion (SFM). However, because many combinations of 3-D structure and motion can lead to the same optic flow, SFM is an ill-posed
inverse problem. The rigidity hypothesis is a constraint supposed to formally solve the SFM problem and to account for human
performance. Recently, however, a number of psychophysical results, with both moving and stationary human observers, have
shown that the rigidity hypothesis alone cannot account for human performance in SFM tasks, but no model is known to account
for the new results. Here, we construct a Bayesian model of SFM based mainly on one new hypothesis, that of stationarity,
coupled with the rigidity hypothesis. The predictions of the model, calculated using a new and powerful methodology called
Bayesian programming, account for a wide variety of experimental findings. 相似文献
15.
Yuan X Hu ZZ Wu HT Torii M Narayanaswamy M Ravikumar KE Vijay-Shanker K Wu CH 《Bioinformatics (Oxford, England)》2006,22(13):1668-1669
A web-based version of the RLIMS-P literature mining system was developed for online mining of protein phosphorylation information from MEDLINE abstracts. The online tool presents extracted phosphorylation objects (phosphorylated proteins, phosphorylation sites and protein kinases) in summary tables and full reports with evidence-tagged abstracts. The tool further allows mapping of phosphorylated proteins to protein entries in the UniProt Knowledgebase based on PubMed ID and/or protein name. The literature mining, coupled with database association, allows retrieval of rich biological information for the phosphorylated proteins and facilitates database annotation of phosphorylation features. 相似文献
16.
Background
Molecular experiments using multiplex strategies such as cDNA microarrays or proteomic approaches generate large datasets requiring biological interpretation. Text based data mining tools have recently been developed to query large biological datasets of this type of data. PubMatrix is a web-based tool that allows simple text based mining of the NCBI literature search service PubMed using any two lists of keywords terms, resulting in a frequency matrix of term co-occurrence. 相似文献17.
Guo Y 《Biometrics》2011,67(4):1532-1542
Independent component analysis (ICA) has become an important tool for analyzing data from functional magnetic resonance imaging (fMRI) studies. ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a prespecified group design matrix and the uncertainty in between-subjects variability in fMRI data. We present a general probabilistic ICA (PICA) model that can accommodate varying group structures of multisubject spatiotemporal processes. An advantage of the proposed model is that it can flexibly model various types of group structures in different underlying neural source signals and under different experimental conditions in fMRI studies. A maximum likelihood (ML) method is used for estimating this general group ICA model. We propose two expectation-maximization (EM) algorithms to obtain the ML estimates. The first method is an exact EM algorithm, which provides an exact E-step and an explicit noniterative M-step. The second method is a variational approximation EM algorithm, which is computationally more efficient than the exact EM. In simulation studies, we first compare the performance of the proposed general group PICA model and the existing probabilistic group ICA approach. We then compare the two proposed EM algorithms and show the variational approximation EM achieves comparable accuracy to the exact EM with significantly less computation time. An fMRI data example is used to illustrate application of the proposed methods. 相似文献
18.
19.
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to a range of problems such as improving homology search, identifying cellular location, and so on. To encourage participation and accelerate progress in this expanding field, we propose creating challenge evaluations, and we describe two specific applications in this context. 相似文献
20.
BioContrasts: extracting and exploiting protein-protein contrastive relations from biomedical literature 总被引:2,自引:0,他引:2
MOTIVATION: Contrasts are useful conceptual vehicles for learning processes and exploratory research of the unknown. For example, contrastive information between proteins can reveal what similarities, divergences and relations there are of the two proteins, leading to invaluable insights for better understanding about the proteins. Such contrastive information are found to be reported in the biomedical literature. However, there have been no reported attempts in current biomedical text mining work that systematically extract and present such useful contrastive information from the literature for exploitation. RESULTS: Our BioContrasts system extracts protein-protein contrastive information from MEDLINE abstracts and presents the information to biologists in a web-application for exploitation. Contrastive information are identified in the text abstracts with contrastive negation patterns such as 'A but not B'. A total of 799 169 pairs of contrastive expressions were successfully extracted from 2.5 million MEDLINE abstracts. Using grounding of contrastive protein names to Swiss-Prot entries, we were able to produce 41 471 pieces of contrasts between Swiss-Prot protein entries. These contrastive pieces of information are then presented via a user-friendly interactive web portal that can be exploited for applications such as the refinement of biological pathways. AVAILABILITY: BioContrasts can be accessed at http://biocontrasts.i2r.a-star.edu.sg. It is also mirrored at http://biocontrasts.biopathway.org. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online. 相似文献