首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The inability to identify fragile sites from data for single individuals remains the major obstacle to determining whether these chromosomal loci are predisposed to cancer-causing and evolutionary rearrangements. We describe a novel statistical model that is amenable to data from single individuals and that establishes site-specific chromosomal breakage as nonrandom with respect to the distribution of total breakage. Our method tests incrementally smaller subsets of the data for homogeneity under a multinomial model that assigns equal probabilites to a maximal set of nonfragile sites and unrestricted probabilities to the remaining fragile sites with significantly higher numbers of breaks. We show how standardized Pearson's chi-square (X2) and likelihood-ratio (G2) statistics can be appropriately used to measure goodness-of-fit for sparse contingency (individual-based) data in this model. A sample application of this approach indicates extensive variation in fragile sites among individuals and marked differences in fragile-site inferences from pooled as opposed to per-individual data.  相似文献   

2.

Background  

Biomedical literature, e.g., MEDLINE, contains a wealth of knowledge regarding functions of proteins. Major recurring biological concepts within such text corpora represent the domains of this body of knowledge. The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE? titles and abstracts by applying a probabilistic topic model.  相似文献   

3.
4.
5.
6.

Background

Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.

Methods

We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.

Results

We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.

Conclusions

Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).
  相似文献   

7.
Correction to A. Louren?o, M. Conover, A. Wong, A. Nematzadeh, F. Pan, H. Shatkay, and L.M. Rocha."A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature". BMC Bioinformatics 2011, 12(Suppl 8):S12. doi:http://10.1186/1471-2105-12-S8-S12.  相似文献   

8.

Background  

Biomedical researchers often want to explore pathogenesis and pathways regulated by abnormally expressed genes, such as those identified by microarray analyses. Literature mining is an important way to assist in this task. Many literature mining tools are now available. However, few of them allows the user to make manual adjustments to zero in on what he/she wants to know in particular.  相似文献   

9.
Selective isolation of mycobacteria from soil: a statistical analysis approach   总被引:10,自引:0,他引:10  
We compared four decontamination methods for the isolation of mycobacteria from soil specimens. Different media were used: L?wenstein-Jensen, Ogawa and various modified Ogawa media. Statistical analysis demonstrated that the best results (low contamination and high positivity rates) were obtained when the specimens were incubated in trypticase soy broth, treated with solutions containing malachite green and cycloheximide, then decontaminated with sodium hydroxide and inoculated onto Ogawa media. The lowest contamination rates were obtained with Ogawa medium containing 500 micrograms cycloheximide ml-1. The use of these techniques is proposed for the isolation of mycobacteria from heavily contaminated clinical specimens as well as from soil.  相似文献   

10.

Background  

In testing for differential gene expression involving multiple serial analysis of gene expression (SAGE) libraries, it is critical to account for both between and within library variation. Several methods have been proposed, including the t test, t w test, and an overdispersed logistic regression approach. The merits of these tests, however, have not been fully evaluated. Questions still remain on whether further improvements can be made.  相似文献   

11.
Zhao J  Yang TH  Huang Y  Holme P 《PloS one》2011,6(9):e24306
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.  相似文献   

12.
The purpose of this paper is to construct a model that represents the human process of understanding metaphors, focusing specifically on similes of the form an "A like B". Generally speaking, human beings are able to generate and understand many sorts of metaphors. This study constructs the model based on a probabilistic knowledge structure for concepts which is computed from a statistical analysis of a large-scale corpus. Consequently, this model is able to cover the many kinds of metaphors that human beings can generate. Moreover, the model implements the dynamic process of metaphor understanding by using a neural network with dynamic interactions. Finally, the validity of the model is confirmed by comparing model simulations with the results from a psychological experiment.  相似文献   

13.
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.  相似文献   

14.
We present GENECODIS, a web-based tool that integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by statistical significance. The analysis of concurrent annotations provides significant information for the biologic interpretation of high-throughput experiments and may outperform the results of standard methods for the functional analysis of gene lists. GENECODIS is publicly available at .  相似文献   

15.
This paper applies a statistical thermodynamic approach to the kinetics of microbial growth influenced by pH. A general equation is developed and shown to provide a good theoretical basis for the existing pH models that have been widely used to describe the effects of pH on microbial growth kinetics. Four experimental data sets are used to test the general equation developed. The four data sets exhibited a variety of functional curve shapes, for example, symmetrical and asymmetrical bell-shaped, when the specific growth rate of microorganisms is plotted as a function of pH. All four data sets are found to be well represented by the general equation. The existing pH model was, however, found to represent only one out of four data sets, i.e., the symmetrical case.  相似文献   

16.
Even Tjrve 《Ecography》2002,25(1):17-24
This paper discusses species diversity in simple multi-habitat environments. Its main purpose is to present simple mathematical and graphical models on how landscape patterns affect species numbers. The idea is to build models of species diversity in multi-habitat landscapes by combining species-area curves for different habitats. Predictions are made about how variables such as species richness and species overlap between habitats influence the proportion of the total landscape each habitat should constitute, and how many habitats it should be divided into in order to be able to sustain the maximal number of species. Habitat size and numbers are the only factors discussed here, not habitat spatial patterns. Among the predictions are: 1) where there are differences in species diversity between habitats, optimal landscape patterns contain larger proportions of species rich habitats. 2) Species overlap between habitats shifts the optimum further towards larger proportions of species rich habitat types. 3) Species overlap also shifts the optimum towards fewer habitat types. 4) Species diversity in landscapes with large species overlap is more resistant to changes in landscape (or reserve) size. This type of model approach can produce theories useful to nature and landscape management in general, and the design of nature reserves and national parks in particular.  相似文献   

17.
18.
The traditional approaches of estimating heterogeneous properties in a soft tissue structure using optimization-based inverse methods often face difficulties because of the large number of unknowns to be simultaneously determined. This article proposes a new method for identifying the heterogeneous anisotropic nonlinear elastic properties in cerebral aneurysms. In this method, the local properties are determined directly from the pointwise stress–strain data, thus avoiding the need for simultaneously optimizing for the property values at all points/regions in the aneurysm. The stress distributions needed for a pointwise identification are computed using an inverse elastostatic method without invoking the material properties in question. This paradigm is tested numerically through simulated inflation tests on an image-based cerebral aneurysm sac. The wall tissue is modeled as an eight-ply laminate whose constitutive behavior is described by an anisotropic hyperelastic strain energy function containing four parameters. The parameters are assumed to vary continuously in the sac. Deformed configurations generated from forward finite element analysis are taken as input to inversely establish the parameter distributions. The delineated and the assigned distributions are in excellent agreement. A forward verification is conducted by comparing the displacement solutions obtained from the delineated and the assigned material parameters at a different pressure. The deviations in nodal displacements are found to be within 0.2% in most part of the sac. The study highlights some distinct features of the proposed method, and demonstrates the feasibility of organ level identification of the distributive anisotropic nonlinear properties in cerebral aneurysms.  相似文献   

19.
Numerous investigations in the last years focused on chromosome arrangements in interphase nuclei. Recent experiments concerning the radial positioning of chromosomes in the nuclear volume of human and primate lymphocyte cells suggest a relationship between the gene density of a chromosome territory (CT) and its distance to the nuclear center. To relate chromosome positioning and gene density in a quantitative way, computer simulations of whole human cell nuclear genomes of normal karyotype were performed on the basis of the spherical 1 Mbp chromatin domain model and the latest data about sequence length and gene density of chromosomes. Three different basic assumptions about the initial distribution of chromosomes were used: a statistical, a deterministic, and a probabilistic initial distribution. After a simulated decondensation in early G1, a comparison of the radial distributions of simulated and experimentally obtained data for CTs Nos. 12, 18, 19, and 20 was made. It was shown that the experimentally observed distributions can be fitted better assuming an initial probabilistic distribution. This supports the concept of a probabilistic global gene positioning code depending on CT sequence length and gene density.  相似文献   

20.
Neurohormone secretion is viewed here as a variable (unknown) admixture of basal and pulsatile release mechanisms, convolved with individually fitted biexponential elimination kinetics. This construct allows maximum-likelihood estimates of both (regulated and constitutive) components of hormone secretion. Thereby, we infer that a prolonged slow-component half-life of gonadotropin removal and amplified pulsatile (and total) daily luteinizing hormone (LH) secretion rates jointly explicate the postmenopausal elevation in serum LH concentrations without a necessary rise in basal LH secretion rates. This biomathematical formulation should be useful in exploring other neuroregulatory mechanisms that underlie single or dual alterations in the basal versus pulsatile modes of hormone secretion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号