首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
Despite the increasing opportunity to collect large‐scale data sets for population genomic analyses, the use of high‐throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty–ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high‐throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.  相似文献   

2.
Testing of Hardy–Weinberg proportions (HWP) with asymptotic goodness-of-fit tests is problematic when the contingency table of observed genotype counts has sparse cells or the sample size is low, and exact procedures are to be preferred. Exact p-values can be (1) calculated via computational demanding enumeration methods or (2) approximated via simulation methods. Our objective was to develop a new algorithm for exact tests of HWP with multiple alleles on the basis of conditional probabilities of genotype arrays, which is faster than existing algorithms. We derived an algorithm for calculating the exact permutation significance value without enumerating all genotype arrays having the same allele counts as the observed one. The algorithm can be used for testing HWP by (1) summation of the conditional probabilities of occurrence of genotype arrays with smaller probability than the observed one, and (2) comparison of the sum with a nominal Type I error rate α. Application to published experimental data from seven maize populations showed that the exact test is computationally feasible and reduces the number of enumerated genotype count matrices about 30% compared with previously published algorithms.  相似文献   

3.
A clarification of the Hardy-Weinberg law   总被引:3,自引:0,他引:3       下载免费PDF全文
Stark AE 《Genetics》2006,174(3):1695-1697
C. C. Li showed that Hardy-Weinberg proportions (HWP) can be maintained in a large population by nonrandom mating as well as random mating. In particular he gave the mating matrix for the symmetric case in the most general form possible. Thus Li showed that, once HWP are attained, the same proportions can be maintained by what he called pseudorandom mating. This article shows that, starting from any genotypic distribution at a single locus with two alleles, the same in each sex, HWP can be reached in one round of nonrandom mating with no change in allele frequency. In the model that demonstrates this fact, random mating is represented by a single point in a continuum of nonrandom possibilities.  相似文献   

4.

Background

This article describes classical and Bayesian interval estimation of genetic susceptibility based on random samples with pre-specified numbers of unrelated cases and controls.

Results

Frequencies of genotypes in cases and controls can be estimated directly from retrospective case-control data. On the other hand, genetic susceptibility defined as the expected proportion of cases among individuals with a particular genotype depends on the population proportion of cases (prevalence). Given this design, prevalence is an external parameter and hence the susceptibility cannot be estimated based on only the observed data. Interval estimation of susceptibility that can incorporate uncertainty in prevalence values is explored from both classical and Bayesian perspective. Similarity between classical and Bayesian interval estimates in terms of frequentist coverage probabilities for this problem allows an appealing interpretation of classical intervals as bounds for genetic susceptibility. In addition, it is observed that both the asymptotic classical and Bayesian interval estimates have comparable average length. These interval estimates serve as a very good approximation to the "exact" (finite sample) Bayesian interval estimates. Extension from genotypic to allelic susceptibility intervals shows dependency on phenotype-induced deviations from Hardy-Weinberg equilibrium.

Conclusions

The suggested classical and Bayesian interval estimates appear to perform reasonably well. Generally, the use of exact Bayesian interval estimation method is recommended for genetic susceptibility, however the asymptotic classical and approximate Bayesian methods are adequate for sample sizes of at least 50 cases and controls.  相似文献   

5.
Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption—specifically, that difficult-to-impute SNPs tend to have larger effects—and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate—their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.  相似文献   

6.
A two-stage Bayesian method is presented for analyzing case-control studies in which a binary variable is sometimes measured with error but the correct values of the variable are known for a random subset of the study group. The first stage of the method is analytically tractable and MCMC methods are used for the second stage. The posterior distribution from the first stage becomes the prior distribution for the second stage, thus transferring all relevant information between the stages. The method makes few distributional assumptions and requires no asymptotic approximations. It is computationally fast and can be run using standard software. It is applied to two data sets that have been analyzed by other methods, and results are compared.  相似文献   

7.
MOTIVATION: There are several levels of uncertainty involved in the mathematical modelling of biochemical systems. There often may be a degree of uncertainty about the values of kinetic parameters, about the general structure of the model and about the behaviour of biochemical species which cannot be observed directly. The methods of Bayesian inference provide a consistent framework for modelling and predicting in these uncertain conditions. We present a software package for applying the Bayesian inferential methodology to problems in systems biology. RESULTS: Described herein is a software package, BioBayes, which provides a framework for Bayesian parameter estimation and evidential model ranking over models of biochemical systems defined using ordinary differential equations. The package is extensible allowing additional modules to be included by developers. There are no other such packages available which provide this functionality.  相似文献   

8.
In quantitative genetics, Markov chain Monte Carlo (MCMC) methods are indispensable for statistical inference in non-standard models like generalized linear models with genetic random effects or models with genetically structured variance heterogeneity. A particular challenge for MCMC applications in quantitative genetics is to obtain efficient updates of the high-dimensional vectors of genetic random effects and the associated covariance parameters. We discuss various strategies to approach this problem including reparameterization, Langevin-Hastings updates, and updates based on normal approximations. The methods are compared in applications to Bayesian inference for three data sets using a model with genetically structured variance heterogeneity.  相似文献   

9.
W G Hill 《Biometrics》1975,31(4):881-888
Methods are outlined for analyzing data on genotype frequencies at several codominant loci in random mating diploid populations. Maximum likelihood (ML) methods are given for estimating chromosomal frequencies. Using these, a succession of models of assumed independence of gene frequency are fitted. These are based on those used in multi-dimensional contigency tables, and tests for association (linkage disequilibrium), made using likelihood ratios. The methods are illustrated with an example.  相似文献   

10.
In infectious disease epidemiology, statistical methods are an indispensable component for the automated detection of outbreaks in routinely collected surveillance data. So far, methodology in this area has been largely of frequentist nature and has increasingly been taking inspiration from statistical process control. The present work is concerned with strengthening Bayesian thinking in this field. We extend the widely used approach of Farrington et al. and Heisterkamp et al. to a modern Bayesian framework within a time series decomposition context. This approach facilitates a direct calculation of the decision‐making threshold while taking all sources of uncertainty in both prediction and estimation into account. More importantly, with the methodology it is now also possible to integrate covariate processes, e.g. weather influence, into the outbreak detection. Model inference is performed using fast and efficient integrated nested Laplace approximations, enabling the use of this method in routine surveillance at public health institutions. Performance of the algorithm was investigated by comparing simulations with existing methods as well as by analysing the time series of notified campylobacteriosis cases in Germany for the years 2002–2011, which include absolute humidity as a covariate process. Altogether, a flexible and modern surveillance algorithm is presented with an implementation available through the R package ‘surveillance’.  相似文献   

11.
spag e d i version 1.0 is a software primarily designed to characterize the spatial genetic structure of mapped individuals or populations using genotype data of codominant markers. It computes various statistics describing genetic relatedness or differentiation between individuals or populations by pairwise comparisons and tests their significance by appropriate numerical resampling. spag e d i is useful for: (i) detecting isolation by distance within or among populations and estimating gene dispersal parameters; (ii) assessing genetic relatedness between individuals and its actual variance, a parameter of interest for marker based inferences of quantitative inheritance; (iii) assessing genetic differentiation among populations, including the case of haploids or autopolyploids.  相似文献   

12.
Phylogenetic comparative methods use tree topology, branch lengths, and models of phenotypic change to take into account nonindependence in statistical analysis. However, these methods normally assume that trees and models are known without error. Approaches relying on evolutionary regimes also assume specific distributions of character states across a tree, which often result from ancestral state reconstructions that are subject to uncertainty. Several methods have been proposed to deal with some of these sources of uncertainty, but approaches accounting for all of them are less common. Here, we show how Bayesian statistics facilitates this task while relaxing the homogeneous rate assumption of the well-known phylogenetic generalized least squares (PGLS) framework. This Bayesian formulation allows uncertainty about phylogeny, evolutionary regimes, or other statistical parameters to be taken into account for studies as simple as testing for coevolution in two traits or as complex as testing whether bursts of phenotypic change are associated with evolutionary shifts in intertrait correlations. A mixture of validation approaches indicates that the approach has good inferential properties and predictive performance. We provide suggestions for implementation and show its usefulness by exploring the coevolution of ankle posture and forefoot proportions in Carnivora.  相似文献   

13.
Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (K) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).  相似文献   

14.
LI (1988) showed that random mating is a sufficient, not a necessary condition for the Hardy-Weinberg principle. A nonrandom mating population that behaves like a random mating population is thus called a ‘pseudo-random mating population’ by him. The pseudo-random mating system studied by him has been focused on those populations in which the parental generation is in Hardy-Weinberg proportions. In other words, the mating type frequency deviations from random mating for each parental genotype add up to zero. In this article these restrictions are relaxed and new pseudo-random mating systems that immediately yield Hardy-Weinberg offspring are also obtained. This is possible because reciprocal crosses have identical segregation probabilities for an autosomal locus, and the manipulation of the combined frequency of reciprocal crosses does not change the gene frequency of the population. A comparison of these new patterns with that of Li is given in the Discussion.  相似文献   

15.
Sillanpää MJ  Arjas E 《Genetics》1999,151(4):1605-1619
A general fine-scale Bayesian quantitative trait locus (QTL) mapping method for outcrossing species is presented. It is suitable for an analysis of complete and incomplete data from experimental designs of F2 families or backcrosses. The amount of genotyping of parents and grandparents is optional, as well as the assumption that the QTL alleles in the crossed lines are fixed. Grandparental origin indicators are used, but without forgetting the original genotype or allelic origin information. The method treats the number of QTL in the analyzed chromosome as a random variable and allows some QTL effects from other chromosomes to be taken into account in a composite interval mapping manner. A block-update of ordered genotypes (haplotypes) of the whole family is sampled once in each marker locus during every round of the Markov Chain Monte Carlo algorithm used in the numerical estimation. As a byproduct, the method gives the posterior distributions for linkage phases in the family and therefore it can also be used as a haplotyping algorithm. The Bayesian method is tested and compared with two frequentist methods using simulated data sets, considering two different parental crosses and three different levels of available parental information. The method is implemented as a software package and is freely available under the name Multimapper/outbred at URL http://www.rni.helsinki.fi/mjs/.  相似文献   

16.
The estimation of outcrossing rates in hermaphroditic species has been a major focus in the evolutionary study of reproductive strategies, and is also essential for plant breeding and conservation. Surprisingly, genomics has thus far minimally influenced outcrossing rate studies. In this article, we generalize a Bayesian inference method (BORICE) to accommodate genomic data from multiple subpopulations of a species. As an empirical demonstration, BORICE is applied to 115 maternal families of Mimulus guttatus. The analysis shows that low‐level whole genome sequencing of parents and offspring is sufficient for individualized mating system estimation: 208 offspring (88.5%) were definitively called as outcrossed, 23 (9.8%) as selfed. After mating system parameters are established (each offspring as outcrossed or selfed and the inbreeding level of maternal plants), BORICE outputs posterior genotype probabilities for each SNP genomewide. Individual SNP calls are often burdened with considerable uncertainty and distilling information from closely linked sites (within genomic windows) can be a useful strategy. For the Mimulus data, principal components based on window statistics were sufficient to diagnose inversion polymorphisms and estimate their effects on spatial structure, phenotypic and fitness measures. More generally, mating system estimation with BORICE can set the stage for population and quantitative genomic analyses, particularly researchers collect phenotypic or fitness data from maternal individuals.  相似文献   

17.
We compared the performance of Bayesian learning strategies and approximations to such strategies, which are far less computationally demanding, in a setting requiring individuals to make binary decisions based on experience. Extending Bayesian updating schemes, we compared the different strategies while allowing for various implementations of memory and knowledge about the environment. The dynamics of the observable variables was modeled through basic probability distributions and convolution. This theoretical framework was applied to the problem of male fruit flies who have to decide which females they should court. Computer simulations indicated that, for most parameter values, approximations to the Bayesian strategy performed as well as the full Bayesian one. The linear approximation, reminiscent of the linear operator, was notably successful, and, without innate knowledge, the only successful learning strategy. Besides being less demanding in computation and thus realistic for small brains, the linear approximation was also successful at limited memory, which would translate into robustness in rapidly changing environments. Knowledge about the environment boosted the performance of the various learning strategies with maximal performance at large utilization of memory. Only for limited memory capacities, intermediate knowledge was most successful. We conclude that many animals may rely on algorithms that involve approximations rather than full Bayesian calculations because such approximations achieve high levels of performance with only a fraction of the computational requirements, in particular for extensions of Bayesian updating schemes, which can represent universal and realistic environments.  相似文献   

18.
Nested clade phylogeographical analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographical hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographical model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyse a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the 'probabilities' generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models.  相似文献   

19.
Female lesser wax moths (Achroia grisella) choose males based on characters of their ultrasonic advertisement signals. Because a female''s opportunity to obtain increased somatic benefits by mating with a particular male is limited, we investigated whether females obtain genetic benefits for their offspring via mate choice. Controlled breeding experiments conducted under favourable food and temperature conditions showed that developmental characters are heritable, that sire attractiveness and offspring survivorship are unrelated, but that females mating with attractive signallers produce offspring who mature faster than the offspring of females mating with non-attractive signallers. However, under some unfavourable food or temperature conditions, it is the offspring of females mating with non-attractive males who mature faster; these offspring are heavier as well. Thus, the relationship between male attractiveness and offspring development is not environmentally robust, and support for a good genes model of mate choice in A. grisella is dependent on conditions. These findings suggest genotype–environment interactions and emphasize the necessity of testing sexual selection models under a range of natural environments.  相似文献   

20.
Ecologists often use dispersion metrics and statistical hypothesis testing to infer processes of community formation such as environmental filtering, competitive exclusion, and neutral species assembly. These metrics have limited power in inferring assembly models because they rely on often‐violated assumptions. Here, we adapt a model of phenotypic similarity and repulsion to simulate the process of community assembly via environmental filtering and competitive exclusion, all while parameterizing the strength of the respective ecological processes. We then use random forests and approximate Bayesian computation to distinguish between these models given the simulated data. We find that our approach is more accurate than using dispersion metrics and accounts for uncertainty in model selection. We also demonstrate that the parameter determining the strength of the assembly processes can be accurately estimated. This approach is available in the R package CAMI; Community Assembly Model Inference. We demonstrate the effectiveness of CAMI using an example of plant communities living on lava flow islands.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号