首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Probability and paternity testing.   总被引:7,自引:5,他引:2       下载免费PDF全文
A probability can be viewed as an estimate of a variable that is sometimes 1 and sometimes 0. To have validity, the probability must equal the expected value of that variable. To have utility, the average squared deviation of the probability from the value of that variable should be small. It is shown that probabilities of paternity calculated by the use of Bayes' theorem under appropriate assumptions are valid, but they can vary in utility. In particular, a recently proposed probability of paternity has less utility than the usual one based on the paternity index. Using an arbitrary prior probability in the calculation cannot lead to a valid probability unless, by chance, the chosen prior probability happens to be appropriate. Appropriate assumptions regarding both the prior probability and gene or genotypic frequencies can be estimated from prior experience.  相似文献   

2.
This work addresses issues around physical maps, in particular, for circular genomes. The overlapping relationship between two fragments obtained by applying two different restriction enzymes, separately, is classified as nonoverlapping, partial overlapping, and total overlapping. A double partial overlapping can also appear in a particular situation. Taking into account DNA fragment lengths and under the assumption that the left-hand endpoints of the two restriction fragments are independent random variables, each of which with a uniform distribution along a circular genome, we present expressions for prior probabilities of those events. This information is combined with hybridization data via Bayes' theorem, in order to evaluate corresponding posterior probabilities. Additionally, we explore a sensitivity analysis to quantify the effect of length variation in the results.  相似文献   

3.
Abstract

Quantitative risk assessment (QRA) approaches systematically evaluate the likelihood, impacts, and risk of adverse events. QRA using fault tree analysis (FTA) is based on the assumptions that failure events have crisp probabilities and they are statistically independent. The crisp probabilities of the events are often absent, which leads to data uncertainty. However, the independence assumption leads to model uncertainty. Experts’ knowledge can be utilized to obtain unknown failure data; however, this process itself is subject to different issues such as imprecision, incompleteness, and lack of consensus. For this reason, to minimize the overall uncertainty in QRA, in addition to addressing the uncertainties in the knowledge, it is equally important to combine the opinions of multiple experts and update prior beliefs based on new evidence. In this article, a novel methodology is proposed for QRA by combining fuzzy set theory and evidence theory with Bayesian networks to describe the uncertainties, aggregate experts’ opinions, and update prior probabilities when new evidences become available. Additionally, sensitivity analysis is performed to identify the most critical events in the FTA. The effectiveness of the proposed approach has been demonstrated via application to a practical system.  相似文献   

4.
Determination of the relative gene order on chromosomes is of critical importance in the construction of human gene maps. In this paper we develop a sequential algorithm for gene ordering. We start by comparing three sequential procedures to order three genes on the basis of Bayesian posterior probabilities, maximum-likelihood ratio, and minimal recombinant class. In the second part of the paper we extend sequential procedure based on the posterior probabilities to the general case of g genes. We present a theorem that states that the predicted average probability of committing a decision error, associated with a Bayesian sequential procedure that accepts the hypothesis of a gene-order configuration with posterior probability equal to or greater than pi *, is smaller than 1 - pi *. This theorem holds irrespective of the number of genes, the genetic model, and the source of genetic information. The theorem is an extension of a classical result of Wald, concerning the sum of the actual and the nominal error probabilities in the sequential probability ratio test of two hypotheses. A stepwise strategy for ordering a large number of genes, with control over the decision-error probabilities, is discussed. An asymptotic approximation is provided, which facilitates the calculations with existing computer software for gene mapping, of the posterior probabilities of an order and the error probabilities. We illustrate with some simulations that the stepwise ordering is an efficient procedure.  相似文献   

5.
Kenneth Lange 《Genetica》1995,96(1-2):107-117
The Dirichlet distribution provides a convenient conjugate prior for Bayesian analyses involving multinomial proportions. In particular, allele frequency estimation can be carried out with a Dirichlet prior. If data from several distinct populations are available, then the parameters characterizing the Dirichlet prior can be estimated by maximum likelihood and then used for allele frequency estimation in each of the separate populations. This empirical Bayes procedure tends to moderate extreme multinomial estimates based on sample proportions. The Dirichlet distribution can also be employed to model the contributions from different ancestral populations in computing forensic match probabilities. If the ancestral populations are in genetic equilibrium, then the product rule for computing match probabilities is valid conditional on the ancestral contributions to a typical person of the reference population. This fact facilitates computation of match probabilities and tight upper bounds to match probabilities.Editor's commentsThe author continues the formal Bayesian analysis introduced by Gjertson & Morris in this voluem. He invokes Dirichlet distributions, and so brings rigor to the discussion of the effects of population structure on match probabilities. The increased computational burden this approach entails should not be regarded as a hindrance.  相似文献   

6.
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson''s taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method''s utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.  相似文献   

7.
The problem of identifying significantly differentially expressed genes for replicated microarray experiments is accepted as significant and has been tackled by several researchers. Patterns from Gene Expression (PaGE) and q-values are two of the well-known approaches developed to handle this problem. This paper proposes a powerful approach to handle this problem. We first propose a method for estimating the prior probabilities used in the first version of the PaGE algorithm. This way, the problem definition of PaGE stays intact and we just estimate the needed prior probabilities. Our estimation method is similar to Storey's estimator without being its direct extension. Then, we modify the problem formulation to find significantly differentially expressed genes and present an efficient method for finding them. This formulation increases the power by directly incorporating Storey's estimator. We report the preliminary results on the BRCA data set to demonstrate the applicability and effectiveness of our approach.  相似文献   

8.
Although RANSAC is proven to be robust, the original RANSAC algorithm selects hypothesis sets at random, generating numerous iterations and high computational costs because many hypothesis sets are contaminated with outliers. This paper presents a conditional sampling method, multiBaySAC (Bayes SAmple Consensus), that fuses the BaySAC algorithm with candidate model parameters statistical testing for unorganized 3D point clouds to fit multiple primitives. This paper first presents a statistical testing algorithm for a candidate model parameter histogram to detect potential primitives. As the detected initial primitives were optimized using a parallel strategy rather than a sequential one, every data point in the multiBaySAC algorithm was assigned to multiple prior inlier probabilities for initial multiple primitives. Each prior inlier probability determined the probability that a point belongs to the corresponding primitive. We then implemented in parallel a conditional sampling method: BaySAC. With each iteration of the hypothesis testing process, hypothesis sets with the highest inlier probabilities were selected and verified for the existence of multiple primitives, revealing the fitting for multiple primitives. Moreover, the updated version of the initial probability was implemented based on a memorable form of Bayes’ Theorem, which describes the relationship between prior and posterior probabilities of a data point by determining whether the hypothesis set to which a data point belongs is correct. The proposed approach was tested using real and synthetic point clouds. The results show that the proposed multiBaySAC algorithm can achieve a high computational efficiency (averaging 34% higher than the efficiency of the sequential RANSAC method) and fitting accuracy (exhibiting good performance in the intersection of two primitives), whereas the sequential RANSAC framework clearly suffers from over- and under-segmentation problems. Future work will aim at further optimizing this strategy through its application to other problems such as multiple point cloud co-registration and multiple image matching.  相似文献   

9.
Although Bayesian methods are widely used in phylogenetic systematics today, the foundations of this methodology are still debated among both biologists and philosophers. The Bayesian approach to phylogenetic inference requires the assignment of prior probabilities to phylogenetic trees. As in other applications of Bayesian epistemology, the question of whether there is an objective way to assign these prior probabilities is a contested issue. This paper discusses the strategy of constraining the prior probabilities of phylogenetic trees by means of the Principal Principle. In particular, I discuss a proposal due to Velasco (Biol Philos 23:455–473, 2008) of assigning prior probabilities to tree topologies based on the Yule process. By invoking the Principal Principle I argue that prior probabilities of tree topologies should rather be assigned a weighted mixture of probability distributions based on Pinelis’ (P Roy Soc Lond B Bio 270:1425–1431, 2003) multi-rate branching process including both the Yule distribution and the uniform distribution. However, I argue that this solves the problem of the priors of phylogenetic trees only in a weak form.  相似文献   

10.
The pubic scars of pregnancy and parturition as reported in the anthropological literature are a variant of the recognized clinical entity, osteitis pubis. The os pubis from 86 pre-Columbian and colonial Peruvian mummies were examined for this entity. Osteitis pubis was found in 13 pre-Columbian females (72.3%) and 19 colonial Indians (57.6%). Four of the colonial women were buried with newborn children and one had a four-month-old fetus in utero. This type of study has application in paleopathology and forensic medicine to show possible frequency of pregnancy in a population, but must be refined and quantitated to achieve any more than this. This study is of further interest as it demonstrates that racial groups have apparently a wide difference in frequency of this particular lesion of the pubic symphysis, a matter for further investigation.  相似文献   

11.
Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics   总被引:1,自引:0,他引:1  
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.  相似文献   

12.
Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates.  相似文献   

13.
Insertions and deletions in a profile hidden Markov model (HMM) are modeled by transition probabilities between insert, delete and match states. These are estimated by combining observed data and prior probabilities. The transition prior probabilities can be defined either ad hoc or by maximum likelihood (ML) estimation. We show that the choice of transition prior greatly affects the HMM's ability to discriminate between true and false hits. HMM discrimination was measured using the HMMER 2.2 package applied to 373 families from Pfam. We measured the discrimination between true members and noise sequences employing various ML transition priors and also systematically scanned the parameter space of ad hoc transition priors. Our results indicate that ML priors produce far from optimal discrimination, and we present an empirically derived prior that considerably decreases the number of misclassifications compared to ML. Most of the difference stems from the probabilities for exiting a delete state. The ML prior, which is unaware of noise sequences, estimates a delete-to-delete probability that is relatively high and does not penalize noise sequences enough for optimal discrimination.  相似文献   

14.
This work presents a novel pairwise statistical alignment method based on an explicit evolutionary model of insertions and deletions (indels). Indel events of any length are possible according to a geometric distribution. The geometric distribution parameter, the indel rate, and the evolutionary time are all maximum likelihood estimated from the sequences being aligned. Probability calculations are done using a pair hidden Markov model (HMM) with transition probabilities calculated from the indel parameters. Equations for the transition probabilities make the pair HMM closely approximate the specified indel model. The method provides an optimal alignment, its likelihood, the likelihood of all possible alignments, and the reliability of individual alignment regions. Human alpha and beta-hemoglobin sequences are aligned, as an illustration of the potential utility of this pair HMM approach.  相似文献   

15.
Bayes' law or theorem (1763) allows the expression of a posterior probability of heterozygosity for an X-linked gene, from two different kinds of information, namely: 1. the prior probability for the mother of an isolated case of Duchenne muscular dystrophy: --to be a carrier by mutation of the gene in one of her parents, or by segregation from earlier generations; --to be herself the origin of the mutation; 2. conditional probabilities, taking into consideration the existence of this woman's normal brothers, sons or maternal uncles and the serum creatine-kinase levels in the possible carrier(s) of the mutant gene. In some situations, these calculations give a recurrence risk which is lower than expected at first and allows sometimes to reassure anxious consultants on their genetic risk.  相似文献   

16.
Consider case control analysis with a dichotomous exposure variable that is subject to misclassification. If the classification probabilities are known, then methods are available to adjust odds-ratio estimates in light of the misclassification. We study the realistic scenario where reasonable guesses, but not exact values, are available for the classification probabilities. If the analysis proceeds by simply treating the guesses as exact, then even small discrepancies between the guesses and the actual probabilities can seriously degrade odds-ratio estimates. We show that this problem is mitigated by a Bayes analysis that incorporates uncertainty about the classification probabilities as prior information.  相似文献   

17.
《CMAJ》1983,129(10):1093-1099
We have now shown you how to use decision analysis in making those rare, tough diagnostic decisions that are not soluble through other, easier routes. In summary, to "use more complex maths" the following steps will be useful: Create a decision tree or map of all the pertinent courses of action and their consequences. Assign probabilities to the branches of each chance node. Assign utilities to each of the potential outcomes shown on the decision tree. Combine the probabilities and utilities for each node on the decision tree. Pick the decision that leads to the highest expected utility. Test your decision for its sensitivity to clinically sensible changes in probabilities and utilities. That concludes this series of clinical epidemiology rounds. You''ve come a long way from "doing it with pictures" and are now able to extract most of the diagnostic information that can be provided from signs, symptoms and laboratory investigations. We would appreciate learning whether you have found this series useful and how we can do a better job of presenting these and other elements of "the science of the art of medicine".  相似文献   

18.
Cenik C  Wakeley J 《PloS one》2010,5(9):e13019
Pacific salmon include several species that are both commercially important and endangered. Understanding the causes of loss in genetic variation is essential for designing better conservation strategies. Here we use a coalescent approach to analyze a model of the complex life history of salmon, and derive the coalescent effective population (CES). With the aid of Kronecker products and a convergence theorem for Markov chains with two time scales, we derive a simple formula for the CES and thereby establish its existence. Our results may be used to address important questions regarding salmon biology, in particular about the loss of genetic variation. To illustrate the utility of our approach, we consider the effects of fluctuations in population size over time. Our analysis enables the application of several tools of coalescent theory to the case of salmon.  相似文献   

19.
Innan H  Zhang K  Marjoram P  Tavaré S  Rosenberg NA 《Genetics》2005,169(3):1763-1777
Several tests of neutral evolution employ the observed number of segregating sites and properties of the haplotype frequency distribution as summary statistics and use simulations to obtain rejection probabilities. Here we develop a “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution. To enable exact computation of rejection probabilities for small samples, we derive a recursion under the standard coalescent model for the joint distribution of the haplotype frequencies and the number of segregating sites. For larger samples, we consider simulation-based approaches. The utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.  相似文献   

20.
In Bayesian phylogenetics, confidence in evolutionary relationships is expressed as posterior probability--the probability that a tree or clade is true given the data, evolutionary model, and prior assumptions about model parameters. Model parameters, such as branch lengths, are never known in advance; Bayesian methods incorporate this uncertainty by integrating over a range of plausible values given an assumed prior probability distribution for each parameter. Little is known about the effects of integrating over branch length uncertainty on posterior probabilities when different priors are assumed. Here, we show that integrating over uncertainty using a wide range of typical prior assumptions strongly affects posterior probabilities, causing them to deviate from those that would be inferred if branch lengths were known in advance; only when there is no uncertainty to integrate over does the average posterior probability of a group of trees accurately predict the proportion of correct trees in the group. The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support. We found that across a range of conditions, diffuse priors--either flat or exponential distributions with moderate to large means--provide more reliable inferences than small-mean exponential priors. An empirical Bayes approach that fixes branch lengths at their maximum likelihood estimates yields posterior probabilities that more closely match those that would be inferred if the true branch lengths were known in advance and reduces the rate of strongly supported false inferences compared with fully Bayesian integration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号