首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inferences about linkage disequilibrium.   总被引:32,自引:0,他引:32  
B S Weir 《Biometrics》1979,35(1):235-254
Existing theory for inferences about linkage disequilibrium is restricted to a measure defined on gametic frequencies. Unless gametic frequencies are directly observable, they are inferred from genotypic frequencies under the assumption of random union of gametes. Primary emphasis in this paper is given to genotypic data, and disequilibrium coefficients are defined for all subsets of two or more of the four genes, two at each of two loci, carried by an individual. Linkage disequilibrium coefficients are defined for genes within and between gametes, and methods of estimating and testing these coefficients are given for gametic data. For genotypic data, when coupling and repulsion double heterozygotes cannot be distinguished. Burrows' composite measure of linkage disequilibrium is discussed. In particular, the estimate for this measure and hypothesis tests based on it are compared to the usual maximum likelihood estimate of gametic linkage disequilibrium, and corresponding likelihood ratio or contingency chi-square tests. General use of the composite measure, whether or not random union of gametes is an appropriate assumption, is recommended. Attention is given to small samples, where the non-normality of gene frequencies will have greatest effect on methods of inference based on normal theory. Even tools such as Fisher's z-transformation for the correlation of gene frequencies are found to perform quite satisfactorily.  相似文献   

2.
The methods described here make it possible to use data on sporophytic genotype frequencies to estimate the frequency of gametophytic self-fertilization in populations of homosporous plants. Bootstrap bias reduction is effective in reducing or eliminating the bias of the maximum likelihood estimate of the gametophytic selfing rate. The bias-corrected percentile method provides the most reliable confidence intervals for allele frequencies. The percentile method gives the most reliable confidence intervals for the gametophytic selfing rate when selfing is common. The maximum likelihood intervals, the percentile intervals, the bias-corrected percentile intervals, and the bootstrap t intervals are all overly conservative in their construction of confidence intervals for the gametophytic selfing rate when self-fertilization is rare. Application of the recommended methods indicates that gametophytic self-fertilization is quite rare in two sexually reproducing populations of Pellaea andromedifolia studied by Gastony and Gottlieb (1985).  相似文献   

3.
Empirical power of three preliminary methods for ordering loci.   总被引:2,自引:1,他引:1       下载免费PDF全文
We empirically estimated the power of three methods for ordering loci within a known linkage group. Estimates of pairwise recombination fractions and correlation coefficients were obtained from data on 50 replicates of 50 and 100 pedigrees by using a likelihood method (LIPED; Ott 1974) and the sib-pair test. Locus order then was determined using seriation, multidimensional scaling, and the product of recombination frequencies. Overall, the multidimensional scaling method was less powerful than either seriation or the product of recombination frequencies. The latter two methods were approximately equally powerful. As expected, the power of the sib-pair test was less than half that of the likelihood method.  相似文献   

4.

Background

Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results

We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions

Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.  相似文献   

5.
Using a four-taxon example under a simple model of evolution, we show that the methods of maximum likelihood and maximum posterior probability (which is a Bayesian method of inference) may not arrive at the same optimal tree topology. Some patterns that are separately uninformative under the maximum likelihood method are separately informative under the Bayesian method. We also show that this difference has impact on the bootstrap frequencies and the posterior probabilities of topologies, which therefore are not necessarily approximately equal. Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434, 1996) stated that bootstrap frequencies can, under certain circumstances, be interpreted as posterior probabilities. This is true only if one includes a non-informative prior distribution of the possible data patterns, and most often the prior distributions are instead specified in terms of topology and branch lengths. [Bayesian inference; maximum likelihood method; Phylogeny; support.].  相似文献   

6.
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.  相似文献   

7.
Polar body and oocyte typing is a new technique for gene-centromere mapping and for generating female linkage maps. A maximum likelihood approach is presented for ordering multiple markers relative to the centromere and for estimating recombination frequencies between markers and between the centromere and marker loci. Three marker-centromere orders are possible for each pair of markers: two orders when the centromere flanks the two markers and one order when the centromere is flanked by the two markers. For each possible order, the likelihood was expressed as a function of recombination frequencies for two adjacent intervals. LOD score for recombination frequency between markers or between the centromere and a marker locus was derived based on the likelihood for each gene-centromere order. The methods developed herein provide a general solution to the problem of multilocus genecentromere mapping that involves all theoretical crossover possibilities, including four-strand double crossovers.  相似文献   

8.
Lars  Vogt 《Zoologica scripta》2007,36(4):395-407
By referring to Popperian falsificationism, proponents of cladistic parsimony claim the superiority of parsimony over likelihood. They conclude that likelihood as a statistical approach is inconsistent with falsificationism, and base their argumentation on four claims: (1) congruence tests cladograms against observational evidence and represents the most important test in phylogenetics, in which minimum‐step trees represent most corroborated trees; (2) frequency probabilities cannot be used for evaluating degree of corroboration; (3) phylogeny represents a unique process and thus frequencies cannot be applied as they require statistical reference classes that are necessarily general; (4) likelihood is a verificationist approach. After discussing the deficiencies of the cladistic phylogeneticists’ conceptualisation of the congruence test and the presentation of an alternative conceptualisation, it is shown that these four claims cannot be sustained within a falsificationist framework, and that the weighting of characters is a necessity. A differentiation between the theoretical concept of apomorphy and the epistemological concept of character weight is proposed. While apomorphies have to be independent from each other, the weighting of characters is interdependent due to human inability to distinguish organismic traits that are structurally identical though they do not share a common evolutionary origin. The possibility of this epistemological interdependence can best be dealt with by the application of process frequencies. The importance of process frequencies of specific transformation classes is exemplified in reference to Popper's formula for the measure of degree of corroboration and its consistency is shown. Therefore, the application of statistical methods is reasonable. As a consequence, the question whether likelihood or parsimony methods represent the best approaches in phylogenetics remains a genuinely empirical question that cannot be decided only in reference to Popper's falsificationism.  相似文献   

9.
Intraspecific variation is abundant in all types of systematic characters but is rarely addressed in simulation studies of phylogenetic method performance. We compared the accuracy of 15 phylogenetic methods using simulations to (1) determine the most accurate method(s) for analyzing polymorphic data (under simplified conditions) and (2) test if generalizations about the performance of phylogenetic methods based on previous simulations of fixed (nonpolymorphic) characters are robust to a very different evolutionary model that explicitly includes intraspecific variation. Simulated data sets consisted of allele frequencies that evolved by genetic drift. The phylogenetic methods included eight parsimony coding methods, continuous maximum likelihood, and three distance methods (UPGMA, neighbor joining, and Fitch-Margoliash) applied to two genetic distance measures (Nei's and the modified Cavalli-Sforza and Edwards chord distance). Two sets of simulations were performed. The first examined the effects of different branch lengths, sample sizes (individuals sampled per species), numbers of characters, and numbers of alleles per locus in the eight-taxon case. The second examined more extensively the effects of branch length in the four-taxon, two-allele case. Overall, the most accurate methods were likelihood, the additive distance methods (neighbor joining and Fitch-Margoliash), and the frequency parsimony method. Despite the use of a very different evolutionary model in the present article, many of the results are similar to those from simulations of fixed characters. Similarities include the presence of the "Felsenstein zone," where methods often fail, which suggests that long-branch attraction may occur among closely related species through genetic drift. Differences between the results of fixed and polymorphic data simulations include the following: (1) UPGMA is as accurate or more accurate than nonfrequency parsimony methods across nearly all combinations of branch lengths, and (2) likelihood and the additive distance methods are not positively misled under any combination of branch lengths tested (even when the assumptions of the methods are violated and few characters are sampled). We found that sample size is an important determinant of accuracy and affects the relative success of methods (i.e., distance and likelihood methods outperform parsimony at small sample sizes). Attempts to generalize about the behavior of phylogenetic methods should consider the extreme examples offered by fixed-mutation models of DNA sequence data and genetic-drift models of allele frequencies.  相似文献   

10.
We revisit the usual conditional likelihood for stratum-matched case-control studies and consider three alternatives that may be more appropriate for family-based gene-characterization studies: First, the prospective likelihood, that is, Pr(D/G,A second, the retrospective likelihood, Pr(G/D); and third, the ascertainment-corrected joint likelihood, Pr(D,G/A). These likelihoods provide unbiased estimators of genetic relative risk parameters, as well as population allele frequencies and baseline risks. The parameter estimates based on the retrospective likelihood remain unbiased even when the ascertainment scheme cannot be modeled, as long as ascertainment only depends on families' phenotypes. Despite the need to estimate additional parameters, the prospective, retrospective, and joint likelihoods can lead to considerable gains in efficiency, relative to the conditional likelihood, when estimating genetic relative risk. This is true if baseline risks and allele frequencies can be assumed to be homogeneous. In the presence of heterogeneity, however, the parameter estimates assuming homogeneity can be seriously biased. We discuss the extent of this problem and present a mixed models approach for providing consistent parameter estimates when baseline risks and allele frequencies are heterogeneous. The efficiency gains of the mixed-model prospective, retrospective, and joint likelihoods relative to the efficiency of conditional likelihood are small in the situations presented here.  相似文献   

11.
Design and analysis methods are presented for studying the association of a candidate gene with a disease by using parental data in place of nonrelated controls. This alternative design eliminates spurious differences in allele frequencies between cases and nonrelated controls resulting from different ethnic origins and population stratification for these two groups. We present analysis methods which are based on two genetic relative risks: (1) the relative risk of disease for homozygotes with two copies of the candidate gene versus homozygotes without the candidate gene and (2) the relative risk for heterozygotes with one copy of the candidate gene versus homozygotes without the candidate gene. In addition to estimating the magnitude of these relative risks, likelihood methods allow specific hypotheses to be tested, namely, a test for overall association of the candidate gene with disease, as well as specific genetic hypotheses, such as dominant or recessive inheritance. Two likelihood methods are presented: (1) a likelihood method appropriate when Hardy-Weinberg equilibrium holds and (2) a likelihood method in which we condition on parental genotype data when Hardy-Weinberg equilibrium does not hold. The results for the relative efficiency of these two methods suggest that the conditional approach may at times be preferable, even when equilibrium holds. Sample-size and power calculations are presented for a multitiered design. The purpose of tier 1 is to detect the presence of an abnormal sequence for a postulated candidate gene among a small group of cases. The purpose of tier 2 is to test for association of the abnormal variant with disease, such as by the likelihood methods presented. The purpose of tier 3 is to confirm positive results from tier 2. Results indicate that required sample sizes are smaller when expression of disease is recessive, rather than dominant, and that, for recessive disease and large relative risks, necessary sample sizes may be feasible, even if only a small percentage of the disease can be attributed to the candidate gene.  相似文献   

12.
The assumption of Hardy-Weinberg equilibrium (HWE) is generally required for association analysis using case-control design on autosomes; otherwise, the size may be inflated. There has been an increasing interest of exploring the association between diseases and markers on X chromosome and the effect of the departure from HWE on association analysis on X chromosome. Note that there are two hypotheses of interest regarding the X chromosome: (i) the frequencies of the same allele at a locus in males and females are equal and (ii) the inbreeding coefficient in females is zero (without excess homozygosity). Thus, excess homozygosity and significantly different minor allele frequencies between males and females are used to filter X-linked variants. There are two existing methods to test for (i) and (ii), respectively. However, their size and powers have not been studied yet. Further, there is no existing method to simultaneously detect both hypotheses till now. Therefore, in this article, we propose a novel likelihood ratio test for both (i) and (ii) on X chromosome. To further investigate the underlying reason why the null hypothesis is statistically rejected, we also develop two likelihood ratio tests for detecting (i) and (ii), respectively. Moreover, we explore the effect of population stratification on the proposed tests. From our simulation study, the size of the test for (i) is close to the nominal significance level. However, the size of the excess homozygosity test and the test for both (i) and (ii) is conservative. So, we propose parametric bootstrap techniques to evaluate their validity and performance. Simulation results show that the proposed methods with bootstrap techniques control the size well under the respective null hypothesis. Power comparison demonstrates that the methods with bootstrap techniques are more powerful than those without bootstrap procedure and the existing methods. The application of the proposed methods to a rheumatoid arthritis dataset indicates their utility.  相似文献   

13.
B R Smith  C M Herbinger  H R Merry 《Genetics》2001,158(3):1329-1338
Two Markov chain Monte Carlo algorithms are proposed that allow the partitioning of individuals into full-sib groups using single-locus genetic marker data when no parental information is available. These algorithms present a method of moving through the sibship configuration space and locating the configuration that maximizes an overall score on the basis of pairwise likelihood ratios of being full-sib or unrelated or maximizes the full joint likelihood of the proposed family structure. Using these methods, up to 757 out of 759 Atlantic salmon were correctly classified into 12 full-sib families of unequal size using four microsatellite markers. Large-scale simulations were performed to assess the sensitivity of the procedures to the number of loci and number of alleles per locus, the allelic distribution type, the distribution of families, and the independent knowledge of population allelic frequencies. The number of loci and the number of alleles per locus had the most impact on accuracy. Very good accuracy can be obtained with as few as four loci when they have at least eight alleles. Accuracy decreases when using allelic frequencies estimated in small target samples with skewed family distributions with the pairwise likelihood approach. We present an iterative approach that partly corrects that problem. The full likelihood approach is less sensitive to the precision of allelic frequencies estimates but did not perform as well with the large data set or when little information was available (e.g., four loci with four alleles).  相似文献   

14.
Several different methods for linkage analysis are shown to arise from a single likelihood function L for the observed allele-sharing data at multiple markers in a chromosomal region. These include classical parametric lod score methods, nonparametric or "model-free" affected pedigree-member (APM) methods, and the Gaussian process method. Setting the methods in the context of the likelihood function L clarifies their underlying assumptions. A test statistic derived from L, the efficient score statistic, is introduced. It is asymptotically equivalent to the lod score, but it can be easier to compute when the penetrances and frequencies of alleles of the trait gene are not known. APM test statistics and the Gaussian lod score are shown to be special cases of efficient score statistics. This unified framework facilitates exploration of a range of models for the effects of a putative trait-predisposing gene, and it facilitates sensitivity analyses to examine the consequences of model misspecification.  相似文献   

15.
Nuclear SSRs are notorious for having relatively high frequencies of null alleles, i.e. alleles that fail to amplify and are thus recessive and undetected in heterozygotes. In this paper, we compare two kinds of approaches for estimating null allele frequencies at seven nuclear microsatellite markers in three French Fagus sylvatica populations: (1) maximum likelihood methods that compare observed and expected homozygote frequencies in the population under the assumption of Hardy-Weinberg equilibrium and (2) direct null allele frequency estimates from progeny where parent genotypes are known. We show that null allele frequencies are high in F. sylvatica (7.0% on average with the population method, 5.1% with the progeny method), and that estimates are consistent between the two approaches, especially when the number of sampled maternal half-sib progeny arrays is large. With null allele frequencies ranging between 5% and 8% on average across loci, population genetic parameters such as genetic differentiation (F ST) may be mostly unbiased. However, using markers with such average prevalence of null alleles (up to 15% for some loci) can be seriously misleading in fine scale population studies and parentage analysis.  相似文献   

16.
Modeling residue usage in aligned protein sequences via maximum likelihood   总被引:9,自引:6,他引:3  
A computational method is presented for characterizing residue usage, i.e., site-specific residue frequencies, in aligned protein sequences. The method obtains frequency estimates that maximize the likelihood of the sequences in a simple model for sequence evolution, given a tree or a set of candidate trees computed by other methods. These maximum- likelihood frequencies constitute a profile of the sequences, and thus the method offers a rigorous alternative to sequence weighting for constructing such a profile. The ability of this method to discard misleading phylogenetic effects allows the biochemical propensities of different positions in a sequence to be more clearly observed and interpreted.   相似文献   

17.
It was shown recently using experimental data that it is possible under certain conditions to determine whether a person with known genotypes at a number of markers was part of a sample from which only allele frequencies are known. Using population genetic and statistical theory, we show that the power of such identification is, approximately, proportional to the number of independent SNPs divided by the size of the sample from which the allele frequencies are available. We quantify the limits of identification and propose likelihood and regression analysis methods for the analysis of data. We show that these methods have similar statistical properties and have more desirable properties, in terms of type-I error rate and statistical power, than test statistics suggested in the literature.  相似文献   

18.
Wang J  Whitlock MC 《Genetics》2003,163(1):429-446
In the past, moment and likelihood methods have been developed to estimate the effective population size (N(e)) on the basis of the observed changes of marker allele frequencies over time, and these have been applied to a large variety of species and populations. Such methods invariably make the critical assumption of a single isolated population receiving no immigrants over the study interval. For most populations in the real world, however, migration is not negligible and can substantially bias estimates of N(e) if it is not accounted for. Here we extend previous moment and maximum-likelihood methods to allow the joint estimation of N(e) and migration rate (m) using genetic samples over space and time. It is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of N(e), respectively, if it is ignored. Extensive simulations are run to evaluate the newly developed moment and likelihood methods, which yield generally satisfactory estimates of both N(e) and m for populations with widely different effective sizes and migration rates and patterns, given a reasonably large sample size and number of markers.  相似文献   

19.
In spite of the usefulness of codominant markers in population genetics, the existence of null alleles raises challenging estimation issues in natural populations that are characterized by positive inbreeding coefficients (F > 0). Disregarding the possibility of > 0 in a population will generally lead to overestimates of null allele frequencies. Conversely, estimates of inbreeding coefficients (F) may be strongly biased upwards (excess homozygotes), in the presence of nontrivial frequencies of null alleles. An algorithm has been presented for the estimation of null allele frequencies in inbred populations (van Oosterhout method), using external estimates of the F‐statistics. The goal of this study is to introduce a modification of this method and to provide a formal comparison with an alternative likelihood‐based method (Chybicki‐Burczyk). Using simulated data, we illustrate the strengths and limitations of these competing methods. Under most circumstances, the likelihood method is preferable, but for highly inbred organisms, a modified van Oosterhout method offers some advantages.  相似文献   

20.
Several studies have reported optimal population decoding of sensory responses in two-alternative visual discrimination tasks. Such decoding involves integrating noisy neural responses into a more reliable representation of the likelihood that the stimuli under consideration evoked the observed responses. Importantly, an ideal observer must be able to evaluate likelihood with high precision and only consider the likelihood of the two relevant stimuli involved in the discrimination task. We report a new perceptual bias suggesting that observers read out the likelihood representation with remarkably low precision when discriminating grating spatial frequencies. Using spectrally filtered noise, we induced an asymmetry in the likelihood function of spatial frequency. This manipulation mainly affects the likelihood of spatial frequencies that are irrelevant to the task at hand. Nevertheless, we find a significant shift in perceived grating frequency, indicating that observers evaluate likelihoods of a broad range of irrelevant frequencies and discard prior knowledge of stimulus alternatives when performing two-alternative discrimination.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号