首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT: BACKGROUND: Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS: We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS: We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.  相似文献   

2.
A reweighted atomic probability density is introduced as a means of representing ensembles of NMR structures in a simple, concise and informative manner. This density is shown to give a better visual representation of molecular structure information than an unweighted density, and should provide a useful interactive graphics tool during the course of iterative NMR structure refinement. The approach is illustrated using several examples.  相似文献   

3.
Determination of the relative gene order on chromosomes is of critical importance in the construction of human gene maps. In this paper we develop a sequential algorithm for gene ordering. We start by comparing three sequential procedures to order three genes on the basis of Bayesian posterior probabilities, maximum-likelihood ratio, and minimal recombinant class. In the second part of the paper we extend sequential procedure based on the posterior probabilities to the general case of g genes. We present a theorem that states that the predicted average probability of committing a decision error, associated with a Bayesian sequential procedure that accepts the hypothesis of a gene-order configuration with posterior probability equal to or greater than pi *, is smaller than 1 - pi *. This theorem holds irrespective of the number of genes, the genetic model, and the source of genetic information. The theorem is an extension of a classical result of Wald, concerning the sum of the actual and the nominal error probabilities in the sequential probability ratio test of two hypotheses. A stepwise strategy for ordering a large number of genes, with control over the decision-error probabilities, is discussed. An asymptotic approximation is provided, which facilitates the calculations with existing computer software for gene mapping, of the posterior probabilities of an order and the error probabilities. We illustrate with some simulations that the stepwise ordering is an efficient procedure.  相似文献   

4.
5.
6.
Wildfires are impactful natural disasters, creating a significant impact across many rural communities. Predicting wildfire probability provides authorities with invaluable information to take preventive measures at the early stages. This study establishes Bayesian modelling for predicting the wildfire event probability based on a set of environmental predictors and forest vulnerability, represented by the normalized difference vegetation index. Prior information about the impact of these predictors on the likelihood of wildfire is available in the reports on the past major wildfire events. In that sense, the use of prior information in the Bayesian models has the potential to provide accurate predictions for the wildfire probability. Moreover, the relationship between the predictors creates mediating effects on the likelihood of a wildfire event. A multivariate prior distribution in the Bayesian modelling can capture the mediating effects. In this study, Bayesian models with informative and noninformative priors are considered with independent and multivariate prior distributions to utilize the available prior information and handle the mediating effects between the predictors using the normalized difference vegetation index data provided by Google Earth Engine. Nine years of data were gathered across 9841 sampled areas in a forested land of Australia. Modelling results concluded that forest vulnerability is found to be the dominant predictor of wildfire probability. This modelling can help create a Wildfire Warning Index based on climate data and forest vulnerability measurements, enabling preventative actions in high-risk and targeted areas.  相似文献   

7.
In this study, microsatellite markers were employed to identify the parentage relationship in Scylla paramamosain. The exclusion probability of loci was found to be related with the level of their heterozygosity. When no parent information or only one parent information was available, the exclusion probability ranged from 22.0% to 56.6% and from 41.2% to 73.1%, with the combined exclusion probability for ten loci being 97.0% and 99.8%, respectively. The cumulative assignment success rate was 100% when no parent information was available using seven most informative microsatellite markers. Moreover, the power of the seven microsatellite markers for parentage assignment was tested by a double-blind test, which indicated that 95% of the progeny can be correctly assigned to their parents. This study provided a microsatellite-based approach for parentage assignment in S. paramamosain that will be useful for investigation of genetic background and molecular marker-assisted selective breeding in this important crab species.  相似文献   

8.
MOTIVATION: Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Most current methods select genes based on known phenotype information. However, certain set of genes may correspond to new phenotypes which are yet unknown, and it is important to develop novel effective selection methods for their discovery without using any prior phenotype information. RESULTS: We propose and study a new method to select relevant genes based on their similarity information only. The method relies on a mechanism for discarding irrelevant genes. A two-way ordering of gene expression data can force irrelevant genes towards the middle in the ordering and thus can be discarded. Mechanisms based on variance and principal component analysis are also studied. When applied to expression profiles of colon cancer and leukemia, the unsupervised method outperforms the baseline algorithm that simply uses all genes, and it also selects relevant genes close to those selected using supervised methods. SUPPLEMENT: More results and software are online: http://www.nersc.gov/~cding/2way.  相似文献   

9.
Commonly observed patterns typically follow a few distinct families of probability distributions. Over one hundred years ago, Karl Pearson provided a systematic derivation and classification of the common continuous distributions. His approach was phenomenological: a differential equation that generated common distributions without any underlying conceptual basis for why common distributions have particular forms and what explains the familial relations. Pearson's system and its descendants remain the most popular systematic classification of probability distributions. Here, we unify the disparate forms of common distributions into a single system based on two meaningful and justifiable propositions. First, distributions follow maximum entropy subject to constraints, where maximum entropy is equivalent to minimum information. Second, different problems associate magnitude to information in different ways, an association we describe in terms of the relation between information invariance and measurement scale. Our framework relates the different continuous probability distributions through the variations in measurement scale that change each family of maximum entropy distributions into a distinct family. From our framework, future work in biology can consider the genesis of common patterns in a new and more general way. Particular biological processes set the relation between the information in observations and magnitude, the basis for information invariance, symmetry and measurement scale. The measurement scale, in turn, determines the most likely probability distributions and observed patterns associated with particular processes. This view presents a fundamentally derived alternative to the largely unproductive debates about neutrality in ecology and evolution.  相似文献   

10.

Background

A key to increasing the power of multilocus association tests is to reduce the number of degrees of freedom by suppressing noise from data. One of the difficulties is to decide how much noise to suppress. An often overlooked problem is that commonly used association tests based on genotype data cannot utilize the genetic information contained in spatial ordering of SNPs (see proof in the Appendix), which may prevent them from achieving higher power.

Results

We develop a score test based on wavelet transform with empirical Bayesian thresholding. Extensive simulation studies are carried out under various LD structures as well as using HapMap data from many different chromosomes for both qualitative and quantitative traits. Simulation results show that the proposed test automatically adjusts the level of noise suppression according to LD structures, and it is able to consistently achieve higher or similar powers than many commonly used association tests including the principle component regression method (PCReg).

Conclusion

The wavelet-based score test automatically suppresses the right amount of noise and uses the information contained in spatial ordering of SNPs to achieve higher power.  相似文献   

11.
The rate of evolutionary change associated with a character determines its utility for the reconstruction of phylogenetic history. For a given age of lineage splits, we examine the information content of a character to assess the magnitude and range of an optimal rate of substitution. On the one hand an optimal transition rate must provide sufficiently many character changes to distinguish subclades, whereas on the other hand changes must be sufficiently rare that reversals on a single branch (and hence homoplasy) are uncommon. In this study, we evolve binary characters over three tree topologies with fixed branch lengths, while varying transition rate as a parameter. We use the character state distribution obtained to measure the "information content" of a character given a transition rate. This is done with respect to several criteria-the probability of obtaining the correct tree using parsimony, the probability of infering the correct ancestral state, and Shannon-Weaver and Fisher information measures on the configuration of probability distributions. All of the information measures suggest the intuitive result of the existence of optimal rates for phylogeny reconstruction. This nonzero optimum is less pronounced if one conditions on there having been a change, in which case the parsimony-based results of minimum change being the most informative tends to hold.  相似文献   

12.
Wu MC  Follmann DA 《Biometrics》1999,55(1):75-84
We discuss how to apply the conditional informative missing model of Wu and Bailey (1989, Biometrics 45, 939-955) to the setting where the probability of missing a visit depends on the random effects of the primary response in a time-dependent fashion. This includes the case where the probability of missing a visit depends on the true value of the primary response. Summary measures for missingness that are weighted sums of the indicators of missed visits are derived for these situations. These summary measures are then incorporated as covariates in a random effects model for the primary response. This approach is illustrated by analyzing data collected from a trial of heroin addicts where missed visits are informative about drug test results. Simulations of realistic experiments indicate that these time-dependent summary measures also work well under a variety of informative censoring models. These summary measures can achieve large reductions in estimation bias and mean squared errors relative to those obtained by using other summary measures.  相似文献   

13.
No fallacies in the formulation of the paternity index   总被引:5,自引:3,他引:2       下载免费PDF全文
In a recent publication, Li and Chakravarti claim to have shown that the paternity index is not a likelihood ratio. They present a method of estimating the prior probability of paternity from a sample of previous court cases on the basis of exclusions and nonexclusions. They propose calculating the posterior probability on the basis of this estimated prior and the test result expressed as exclusion/nonexclusion. Their claim is wrong--the paternity index is a likelihood-ratio, that is, the ratio of the likelihood of the observation conditional on the two mutually exclusive hypotheses. Their proposed method of estimating the prior has been long known, has been applied to several samples, and is inferior (in terms of variance of the estimate) to maximum likelihood estimation based on all the phenotypic information available. Their proposed "new method" of calculating a posterior probability is based on the use of a less informative likelihood ratio 1/(1-PE) instead of Gürtler's fully informative paternity index X/Y (Acta Med Leg Soc Liege 9:83-93, 1956), but is otherwise identical to the Bayesian approach originally introduced by Essen-M?ller in 1938.  相似文献   

14.
The posterior probability of linkage (PPL) is a Bayesian statistic which directly measures the probability of linkage between a trait locus and a marker (in the 2-point case) or a genomic region (in the multipoint case). It has several benefits, including ease of interpretation, the ability to incorporate prior genomic information, and a mathematically rigorous and robust procedure for accumulating linkage information across multiple heterogeneous datasets. To date, the majority of work on the PPL has focused on the development of the 2-point statistic, with only preliminary attempts at the development of an equivalent multipoint version. In this paper we present a new way of computing of the multipoint PPL. This new version imputes to each genomic point an estimate of the 2-point PPL we would have obtained from a fully informative marker giving similar evidence for linkage. This version, which we call the imputed PPL, is shown to be superior to previously developed versions.  相似文献   

15.
Three-taxon statement analysis (3TA) and standard cladistic analysis (SCA) were evaluated relative to propositions of taxic homology. There are definite distinctions between complement relation homologs and paired homologs. The complement relation is discussed, relative to rooting, parsimony, and taxic propositions of homology. The complement relation, as implemented in SCA, makes sense only because SCA is a simple evolutionary model of character-state transformation. 3TA is a method for implementing complement relation data from a taxic perspective. The standard approach to cladistic analysis distinguishes taxa by rooting a tree, which means that that approach is incompatible with taxic propositions of homology, because a taxic homology is a hypothesis of relationship between taxa that possess a homolog relative to taxa that lack a homolog. It is not necessary to treat paired homologs from a transformational perspective to distinguish informative from uninformative data. 3TA yields results markedly different from those of SCA. SCA, which seeks to minimize tree length, may not maximize the relation of homology (congruence) relative to a tree.  相似文献   

16.
An analytical procedure for estimating the risk of X-linked diseases based on presence/absence of a series of restriction sites is presented. Multiple-locus linkage phase of the carrier mother is first inferred from previous offspring, from parents, and by molecular means. Bayesian risk estimates are then obtained using this information and the recombination-segregation distribution. The improvement afforded by using multiple flanking markers rather than a single marker is dramatic. Whereas the upper bound on the probability that a family will be informative using a single diallelic X-linked marker is .5, in the case of m markers, the bound on the probability of an informative family becomes 1 - .5m. With a single linked marker, the precision in the risk estimate is bounded by the frequency of recombination, whereas the requirement of very tight linkage is relaxed somewhat when multiple flanking markers are used. Recombination interference and multiple-locus linkage disequilibria can further improve the risk estimates, but it is important to understand how the statistical confidence in these parameters affects the reliability of the risk estimates.  相似文献   

17.
18.
For a linked marker locus to be useful for genetic counseling, the counselee must be heterozygous for both disease and marker loci and his or her linkage phase must be known. It is shown that when the phenotypes of the counselee's previous children for the disease and marker loci are known, the linkage phase can often be inferred with a high probability, and thus it is possible to conduct genetic counseling. To evaluate the utility of linked marker genes for genetic counseling, the accuracy of prediction of the risk for a prospective child with a given marker gene to develop the genetic disease and the proportion of families in which a particular marker locus can be used for genetic counseling are studied for X-linked recessive, autosomal dominant, and autosomal recessive diseases. In the case of X-linked genetic diseases, information from children is very useful for determining the linkage phase of the counselee and predicting the genetic disease. In the case of autosomal dominant diseases, not all children are informative, but if the number of children is large, the phenotypes of children are often more informative than the information from grandparents. In the case of autosomal recessive diseases, information from grandparents is usually useless, since they show a normal phenotype for the disease locus. If we use information on the phenotypes of children, however, the linkage phase of the counselee and the risk of a prospective child can be inferred with a high probability. The proportion of informative families depends on the dominance relationship and frequencies of marker alleles, and the number of children. In general, codominant markers are more useful than are dominant markers, and a locus with high heterozygosity is more useful than is a locus with low heterozygosity.  相似文献   

19.
本文在合理假设的基础上,根据2010年全国研究生数学建模竞赛A题提供的数据及相关信息,在GIS的支持下构建了基因表达图谱模型(简称GEPM),并对其进行空间分析,从而达到对肿瘤识别信息基因提取的目的。结果表明,在参与分析的1 991个基因中,有7个基因可以作为肿瘤识别的信息基因;通过GIS技术构建GEPM对于肿瘤的识别与诊断是可行的。因此,通过本文的研究为基因的识别和研究提供了新的方法。  相似文献   

20.
MOTIVATION: The reconstruction of gene networks from gene-expression microarrays is gaining popularity as methods improve and as more data become available. The reliability of such networks could be judged by the probability that a connection between genes is spurious, resulting from chance fluctuations rather than from a true biological relationship. RESULTS: Unlike the false discovery rate and positive false discovery rate, the decisive false discovery rate (dFDR) is exactly equal to a conditional probability without assuming independence or the randomness of hypothesis truth values. This property is useful not only in the common application to the detection of differential gene expression, but also in determining the probability of a spurious connection in a reconstructed gene network. Estimators of the dFDR can estimate each of three probabilities: (1) The probability that two genes that appear to be associated with each other lack such association. (2) The probability that a time ordering observed for two associated genes is misleading. (3) The probability that a time ordering observed for two genes is misleading, either because they are not associated or because they are associated without a lag in time. The first probability applies to both static and dynamic gene networks, and the other two only apply to dynamic gene networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号