首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The problem of ascertainment for linkage analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
It is generally believed that ascertainment corrections are unnecessary in linkage analysis, provided individuals are selected for study solely on the basis of trait phenotype and not on the basis of marker genotype. The theoretical rationale for this is that standard linkage analytic methods involve conditioning likelihoods on all the trait data, which may be viewed as an application of the ascertainment assumption-free (AAF) method of Ewens and Shute. In this paper, we show that when the observed pedigree structure depends on which relatives within a pedigree happen to have been the probands (proband-dependent, or PD, sampling) conditioning on all the trait data is not a valid application of the AAF method and will result in asymptotically biased estimates of genetic parameters (except under single ascertainment). Furthermore, this result holds even if the recombination fraction R is the only parameter of interest. Since the lod score is proportional to the likelihood of the marker data conditional on all the trait data, this means that when data are obtained under PD sampling the lod score will yield asymptotically biased estimates of R, and that so-called mod scores (i.e., lod scores maximized over both R and parameters theta of the trait distribution) will yield asymptotically biased estimates of R and theta. Furthermore, the problem appears to be intractable, in the sense that it is not possible to formulate the correct likelihood conditional on observed pedigree structure. In this paper we do not investigate the numerical magnitude of the bias, which may be small in many situations. On the other hand, virtually all linkage data sets are collected under PD sampling. Thus, the existence of this bias will be the rule rather than the exception in the usual applications.  相似文献   

2.
A stepwise logistic-regression procedure is proposed for evaluation of the relative importance of variants at different sites within a small genetic region. By fitting statistical models with main effects, rather than modeling the full haplotype effects, we generate tests, with few degrees of freedom, that are likely to be powerful for detecting primary etiological determinants. The approach is applicable to either case/control or nuclear-family data, with case/control data modeled via unconditional and family data via conditional logistic regression. Four different conditioning strategies are proposed for evaluation of effects at multiple, closely linked loci when family data are used. The first strategy results in a likelihood that is equivalent to analysis of a matched case/control study with each affected offspring matched to three pseudocontrols, whereas the second strategy is equivalent to matching each affected offspring with between one and three pseudocontrols. Both of these strategies require you be able to infer parental phase (i.e., those haplotypes present in the parents). Families in which phase cannot be determined must be discarded, which can considerably reduce the effective size of a data set, particularly when large numbers of loci that are not very polymorphic are being considered. Therefore, a third strategy is proposed in which knowledge of parental phase is not required, which allows those families with ambiguous phase to be included in the analysis. The fourth and final strategy is to use conditioning method 2 when parental phase can be inferred and to use conditioning method 3 otherwise. The methods are illustrated using nuclear-family data to evaluate the contribution of loci in the HLA region to the development of type 1 diabetes.  相似文献   

3.
Summary A statistical model is presented for dealing with genotypic frequency data obtained from a single population observed over a run of consecutive generations. This model takes into account possible correlations that exist between generations by conditioning the marginal probability distribution of any one generation on the previously observed generation. Maximum likelihood estimates of the fitness parameters are derived and a hypothesis testing framework developed. The model is very general, and in this paper is applied to random-mating, selfing, parthenogenetic and mixed random-mating and selfing populations with respect to a single locus, g-allele model with constant genotypic fitness differences with all selection occurring either before or after sampling. The assumptions behind this model are contrasted with those of alternative techniques such as minimum chi-square or unconditional maximum likelihood estimation when the marginal likelihoods for any one generation are conditioned only on the initial conditions and not the previous generation. The conditional model is most appropriate when the sample size per generation is large either in an absolute sense or in relation to the total population size. Minimum chi-square and the unconditional likelihood are most appropriate when the population size is effectively infinite and the samples are small. Both models are appropriate when the samples are large and the population size is effectively infinite. Under these last conditions, the conditional model may be preferred because it has greater robustness with respect to small deviations from the underlying assumptions and has a greater simplicity of form. Furthermore, if any genetic drift occurs in the experiment, the minimum chi-square and unconditional likelihood approaches can create spurious evidence for selection while the conditional approach will not. Worked examples are presented.This study was supported in part by the U. S. Atomic Energy Commission, Contract AT (11-1) -1552 to the Department of Human Genetics (CFS), University of Michigan, and by National Science Foundation Grant BMS 74-17453 awarded to the author.  相似文献   

4.
Several different methodologies for parameter estimation under various ascertainment sampling schemes have been proposed in the past. In this article, some of the methodologies that have been proposed for independent sibships under the classical segregation analysis model are synthesized, and the general likelihoods derived for single, multiple and complete ascertainment. The issue of incorporating the sibship size distribution into the analysis is addressed, and the effect of conditioning the likelihood on the observed sibship sizes is discussed. It is shown that when the number of probands in a sibship is not specified, the corresponding likelihood can be used for a broader class of ascertainment schemes than is subsumed by the classical model.  相似文献   

5.
D Gianola  R L Fernando  S Im  J L Foulley 《Génome》1989,31(2):768-777
Conceptual aspects of estimation of genetic components of variance and covariance under selection are discussed, with special attention to likelihood methods. Certain selection processes are described and alternative likelihoods that can be used for analysis are specified. There is a mathematical relationship between the likelihoods that permits comparing the relative amount of information contained in them. Theoretical arguments and evidence indicate that point inferences made from likelihood functions are not affected by some forms of selection.  相似文献   

6.
We revisit the usual conditional likelihood for stratum-matched case-control studies and consider three alternatives that may be more appropriate for family-based gene-characterization studies: First, the prospective likelihood, that is, Pr(D/G,A second, the retrospective likelihood, Pr(G/D); and third, the ascertainment-corrected joint likelihood, Pr(D,G/A). These likelihoods provide unbiased estimators of genetic relative risk parameters, as well as population allele frequencies and baseline risks. The parameter estimates based on the retrospective likelihood remain unbiased even when the ascertainment scheme cannot be modeled, as long as ascertainment only depends on families' phenotypes. Despite the need to estimate additional parameters, the prospective, retrospective, and joint likelihoods can lead to considerable gains in efficiency, relative to the conditional likelihood, when estimating genetic relative risk. This is true if baseline risks and allele frequencies can be assumed to be homogeneous. In the presence of heterogeneity, however, the parameter estimates assuming homogeneity can be seriously biased. We discuss the extent of this problem and present a mixed models approach for providing consistent parameter estimates when baseline risks and allele frequencies are heterogeneous. The efficiency gains of the mixed-model prospective, retrospective, and joint likelihoods relative to the efficiency of conditional likelihood are small in the situations presented here.  相似文献   

7.
Paternity inference using highly polymorphic codominant markers is becoming common in the study of natural populations. However, multiple males are often found to be genetically compatible with each offspring tested, even when the probability of excluding an unrelated male is high. While various methods exist for evaluating the likelihood of paternity of each nonexcluded male, interpreting these likelihoods has hitherto been difficult, and no method takes account of the incomplete sampling and error-prone genetic data typical of large-scale studies of natural systems. We derive likelihood ratios for paternity inference with codominant markers taking account of typing error, and define a statistic Δ for resolving paternity. Using allele frequencies from the study population in question, a simulation program generates criteria for Δ that permit assignment of paternity to the most likely male with a known level of statistical confidence. The simulation takes account of the number of candidate males, the proportion of males that are sampled and gaps and errors in genetic data. We explore the potentially confounding effect of relatives and show that the method is robust to their presence under commonly encountered conditions. The method is demonstrated using genetic data from the intensively studied red deer ( Cervus elaphus ) population on the island of Rum, Scotland. The Windows-based computer program, CERVUS , described in this study is available from the authors. CERVUS can be used to calculate allele frequencies, run simulations and perform parentage analysis using data from all types of codominant markers.  相似文献   

8.
Estimating evolutionary parameters when viability selection is operating   总被引:2,自引:0,他引:2  
Some individuals die before a trait is measured or expressed (the invisible fraction), and some relevant traits are not measured in any individual (missing traits). This paper discusses how these concepts can be cast in terms of missing data problems from statistics. Using missing data theory, I show formally the conditions under which a valid evolutionary inference is possible when the invisible fraction and/or missing traits are ignored. These conditions are restrictive and unlikely to be met in even the most comprehensive long-term studies. When these conditions are not met, many selection and quantitative genetic parameters cannot be estimated accurately unless the missing data process is explicitly modelled. Surprisingly, this does not seem to have been attempted in evolutionary biology. In the case of the invisible fraction, viability selection and the missing data process are often intimately linked. In such cases, models used in survival analysis can be extended to provide a flexible and justified model of the missing data mechanism. Although missing traits pose a more difficult problem, important biological parameters can still be estimated without bias when appropriate techniques are used. This is in contrast to current methods which have large biases and poor precision. Generally, the quantitative genetic approach is shown to be superior to phenotypic studies of selection when invisible fractions or missing traits exist because part of the missing information can be recovered from relatives.  相似文献   

9.
We present a conditional likelihood approach for testing linkage disequilibrium in nuclear families having multiple affected offspring. The likelihood, conditioned on the identity-by-descent (IBD) structure of the sibling genotypes, is unaffected by familial correlation in disease status that arises from linkage between a marker locus and the unobserved trait locus. Two such conditional likelihoods are compared: one that conditions on IBD and phase of the transmitted alleles and a second which conditions only on IBD of the transmitted alleles. Under the log-additive model, the first likelihood is equivalent to the allele-counting methods proposed in the literature. The second likelihood is valid under the added assumption of equal male and female recombination fractions. In a simulation study, we demonstrated that in sibships having two or three affected siblings the score test from each likelihood had the correct test size for testing disequilibrium. They also led to equivalent power to detect linkage disequilibrium at the 5% significance level.  相似文献   

10.
11.
On Ewens'' equivalence theorem for ascertainment sampling schemes   总被引:1,自引:1,他引:0       下载免费PDF全文
The usual likelihood formulations for segregation analysis of a genetic trait ignore both the at-risk but unobservable families and the demographic structure of the surrounding population. Families are not ascertained if, by chance, they have no affected members or if the affected members are not ascertained. Ewens has shown that likelihoods which take into explicit account both unobservable families and demographic parameters lead to the same maximum likelihood estimates of segregation and ascertainment parameters as the usual likelihoods. This paper provides an alternative proof of Ewens' theorem based on the Poisson distribution and simple continuous optimization techniques.  相似文献   

12.
Estimation of variance components in linear mixed models is important in clinical trial and longitudinal data analysis. It is also important in animal and plant breeding for accurately partitioning total phenotypic variance into genetic and environmental variances. Restricted maximum likelihood (REML) method is often preferred over the maximum likelihood (ML) method for variance component estimation because REML takes into account the lost degree of freedom resulting from estimating the fixed effects. The original restricted likelihood function involves a linear transformation of the original response variable (a collection of error contrasts). Harville's final form of the restricted likelihood function does not involve the transformation and thus is much easier to manipulate than the original restricted likelihood function. There are several different ways to show that the two forms of the restricted likelihood are equivalent. In this study, I present a much simpler way to derive Harville's restricted likelihood function. I first treat the fixed effects as random effects and call such a mixed model a pseudo random model (PDRM). I then construct a likelihood function for the PDRM. Finally, I let the variance of the pseudo random effects be infinity and show that the limit of the likelihood function of the PDRM is the restricted likelihood function.  相似文献   

13.
The population genetic study of divergence is often carried out using a Bayesian genealogy sampler, like those implemented in ima2 and related programs, and these analyses frequently include a likelihood ratio test of the null hypothesis of no migration between populations. Cruickshank and Hahn (2014, Molecular Ecology, 23, 3133–3157) recently reported a high rate of false‐positive test results with ima2 for data simulated with small numbers of loci under models with no migration and recent splitting times. We confirm these findings and discover that they are caused by a failure of the assumptions underlying likelihood ratio tests that arises when using marginal likelihoods for a subset of model parameters. We also show that for small data sets, with little divergence between samples from two populations, an excellent fit can often be found by a model with a low migration rate and recent splitting time and a model with a high migration rate and a deep splitting time.  相似文献   

14.
Cook RJ  Farewell VT 《Biometrics》1999,55(1):284-288
We highlight a feature of likelihood-based methods that provides flexibility in model formulation and inference. In particular, overall likelihoods that consist of likelihood contributions with different forms are considered. The particular forms may be predetermined by design criteria or may be selected based on features of the data. Inferences based on such mixed-form likelihoods are valid provided standard regularity conditions hold and the parameters of interest have the same interpretation in the various forms. The advantages of constructing overall likelihoods in this way are illustrated by applications involving the analysis of 2 x 2 x K tables and left-censored water quality data.  相似文献   

15.
We present an alternative method for calculating likelihoods in molecular phylogenetics. Our method is based on partial likelihood tensors, which are generalizations of partial likelihood vectors, as used in Felsenstein's approach. Exploiting a lexicographic sorting and partial likelihood tensors, it is possible to obtain significant computational savings. We show this on a range of simulated data by enumerating all numerical calculations that are required by our method and the standard approach.  相似文献   

16.
For pedigrees with multiple loops, exact likelihoods could not be computed in an acceptable time frame and thus, approximate methods are used. Some of these methods are based on breaking loops and approximations of complex pedigree likelihoods using the exact likelihood of the corresponding zero-loop pedigree. Due to ignoring loops, this method results in a loss of genetic information and a decrease in the power to detect linkage. To minimize this loss, an optimal set of loop breakers has to be selected. In this paper, we present a graph theory based algorithm for automatic selection of an optimal set of loop breakers. We propose using a total relationship between measured pedigree members as a proxy to power. To minimize the loss of genetic information, we suggest selection of such breakers whose duplication in a pedigree would be accompanied by a minimal loss of total relationship between measured pedigree members. We show that our algorithm compares favorably with other existing loop-breaker selection algorithms in terms of conservation of genetic information, statistical power and CPU time of subsequent linkage analysis. We implemented our method in a software package LOOP_EDGE, which is available at http://mga.bionet.nsc.ru/nlru/.  相似文献   

17.

Background  

The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations.  相似文献   

18.
Wang J 《Genetics》2012,191(1):183-194
Quite a few methods have been proposed to infer sibship and parentage among individuals from their multilocus marker genotypes. They are all based on Mendelian laws either qualitatively (exclusion methods) or quantitatively (likelihood methods), have different optimization criteria, and use different algorithms in searching for the optimal solution. The full-likelihood method assigns sibship and parentage relationships among all sampled individuals jointly. It is by far the most accurate method, but is computationally prohibitive for large data sets with many individuals and many loci. In this article I propose a new likelihood-based method that is computationally efficient enough to handle large data sets. The method uses the sum of the log likelihoods of pairwise relationships in a configuration as the score to measure its plausibility, where log likelihoods of pairwise relationships are calculated only once and stored for repeated use. By analyzing several empirical and many simulated data sets, I show that the new method is more accurate than pairwise likelihood and exclusion-based methods, but is slightly less accurate than the full-likelihood method. However, the new method is computationally much more efficient than the full-likelihood method, and for the cases of both sexes polygamous and markers with genotyping errors, it can be several orders faster. The new method can handle a large sample with thousands of individuals and the number of markers limited only by the computer memory.  相似文献   

19.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

20.
The program which is written in FORTRAN estimates haplotype frequencies in two-locus and three-locus genetic systems from population diploid data. It is based on the gene counting method which leads to maximum likelihood estimates, and can be used whenever the possible antigens (one or more) on each chromosome can be specified for each person and for each locus, i.e., ABO-like systems and inclusions are permitted. The number of alleles per locus may be rather large, and both grouped and ungrouped data can be used. Log likelihoods are calculated on the basis of various assumptions, so that likelihood ratio tests can be carried out.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号