首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider sequential sampling of pedigrees for genetic analysis. Cannings and Thompson (1977) gave simple, general guidelines for valid sequential sampling schemes. We show that their formulation of the likelihood contains an error, which is, however, easily corrected so as to maintain the validity of the sequential scheme. We also point out that although sequential and fixed-structure pedigree sampling do have the same likelihoods (as Cannings and Thompson showed), and therefore yield the same maximum likelihood point estimates of genetic parameters, they do not necessarily yield the same significance tests or confidence intervals.  相似文献   

2.
The problem of ascertainment in segregation analysis arises when families are selected for study through ascertainment of affected individuals. In this case, ascertainment must be corrected for in data analysis. However, methods for ascertainment correction are not available for many common sampling schemes, e.g., sequential sampling of extended pedigrees (except in the case of "single" selection). Concerns about whether ascertainment correction is even required for large pedigrees, about whether and how multiple probands in the same pedigree can be taken into account properly, and about how to apply sequential sampling strategies have occupied many investigators in recent years. We address these concerns by reconsidering a central issue, namely, how to handle pedigree structure (including size). We introduce a new distinction, between sampling in such a way that observed pedigree structure does not depend on which pedigree members are probands (proband-independent [PI] sampling) and sampling in such a way that observed pedigree structure does depend on who are the probands (proband-dependent [PD] sampling). This distinction corresponds roughly (but not exactly) to the distinction between fixed-structure and sequential sampling. We show that conditioning on observed pedigree structure in ascertained data sets obtained under PD sampling is not in general correct (with the exception of "single" selection), while PI sampling of pedigree structures larger than simple sibships is generally not possible. Yet, in practice one has little choice but to condition on observed pedigree structure. We conclude that the problem of genetic modeling in ascertained data sets is, in most situations, literally intractable. We recommend that future efforts focus on the development of robust approximate approaches to the problem.  相似文献   

3.
A Gibbs sampling approach to linkage analysis.   总被引:9,自引:0,他引:9  
We present a Monte Carlo approach to estimation of the recombination fraction theta and the profile likelihood for a dichotomous trait and a single marker gene with 2 alleles. The method is an application of a technique known as 'Gibbs sampling', in which random samples of each of the unknowns (here genotypes, theta and nuisance parameters, including the allele frequencies and the penetrances) are drawn from their posterior distributions, given the data and the current values of all the other unknowns. Upon convergence, the resulting samples derive from the marginal distribution of all the unknowns, given only the data, so that the uncertainty in the specification of the nuisance parameters is reflected in the variance of the posterior distribution of theta. Prior knowledge about the distribution of theta and the nuisance parameters can be incorporated using a Bayesian approach, but adoption of a flat prior for theta and point priors for the nuisance parameters would correspond to the standard likelihood approach. The method is easy to program, runs quickly on a microcomputer, and could be generalized to multiple alleles, multipoint linkage, continuous phenotypes and more complex models of disease etiology. The basic approach is illustrated by application to data on cholesterol levels and an a low-density lipoprotein receptor gene in a single large pedigree.  相似文献   

4.
Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N2, where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N.  相似文献   

5.
Pedigree data can be evaluated, and subsequently corrected, by analysis of the distribution of genetic markers, taking account of the possibility of mistyping . Using a model of pedigree error developed previously, we obtained the maximum likelihood estimates of error parameters in pedigree data from Tokelau. Posterior probabilities for the possible true relationships in each family are conditional on the putative relationships and the marker data are calculated using the parameter estimates. These probabilities are used as a basis for discriminating between pedigree error and genetic marker errors in families where inconsistencies have been observed. When applied to the Tokelau data and compared with the results of retyping inconsistent families, these statistical procedures are able to discriminate between pedigree and marker error, with approximately 90% accuracy, for families with two or more offspring. The large proportion of inconsistencies inferred to be due to marker error (61%) indicates the importance of discriminating between error sources when judging the reliability of putative relationship data. Application of our model of pedigree error has proved to be an efficient way of determining and subsequently correcting sources of error in extensive pedigree data collected in large surveys.  相似文献   

6.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

7.
The ascertainment problem arises when families are sampled by a nonrandom process and some assumption about this sampling process must be made in order to estimate genetic parameters. Under classical ascertainment assumptions, estimation of genetic parameters cannot be separated from estimation of the parameters of the ascertainment process, so that any misspecification of the ascertainment process causes biases in estimation of the genetic parameters. Ewens and Shute proposed a resolution to this problem, involving conditioning the likelihood of the sample on the part of the data which is "relevant to ascertainment." The usefulness of this approach can only be assessed by examining the properties (in particular, bias and standard error) of the estimates which arise by using it for a wide range of parameter values and family size distributions and then comparing these biases and standard errors with those arising under classical ascertainment procedures. These comparisons are carried out in the present paper, and we also compare the proposed method with procedures which condition on, or ignore, parts of the data.  相似文献   

8.
Ascertainment-adjusted parameter estimates from a genetic analysis are typically assumed to reflect the parameter values in the original population from which the ascertained data were collected. Burton et al. (2000) recently showed that, given unmodeled parameter heterogeneity, the standard ascertainment adjustment leads to biased parameter estimates of the population-based values. This finding has important implications in complex genetic studies, because of the potential existence of unmodeled genetic parameter heterogeneity. The authors further stated the important point that, given unmodeled heterogeneity, the ascertainment-adjusted parameter estimates reflect the true parameter values in the ascertained subpopulation. They illustrated these statements with two examples. By revisiting these examples, we demonstrate that if the ascertainment scheme and the nature of the data can be correctly modeled, then an ascertainment-adjusted analysis returns population-based parameter estimates. We further demonstrate that if the ascertainment scheme and data cannot be modeled properly, then the resulting ascertainment-adjusted analysis produces parameter estimates that generally do not reflect the true values in either the original population or the ascertained subpopulation.  相似文献   

9.
Quantitative traits measured in human families can be analyzed to partition the total population variance into genetic and environmental components, or to elucidate the genetic mechanism involved. We review the estimation of variance components directly from human pedigree data, or in the form of path coefficients from correlations between pairs of relatives. To elucidate genetic mechanisms, a mixed model that allows for segregation at a major locus, a polygenic effect and a sibling environmental correlation is described for nuclear families. In each case appropriate likelihoods are derived as a basis, using numerical maximum likelihood methods, for parameter estimation and hypothesis testing. A general model is then described that allows for several familial sources of environmental variation, assortative mating, and both major gene and polygenic effects; and an algorithm for calculating the likelihood of a pedigree under this model is indicated. Finally, some of the remaining problems in this area of biometric analysis are pointed out.  相似文献   

10.
We report the results of a simulation study designed to assess the capability of segregation analysis to detect Mendelian transmission and to estimate genetic model parameters for complex qualitative traits, characterized by heritability in the range 0.20-0.45 and low heterozygote penetrance. The pedigree analysis package, PAP, was used to perform the analyses. For all data sets, models of no transmission could be rejected. In most cases, models of Mendelian transmission could not be rejected; however, several samples approached significance levels. When Mendelian transmission was assumed, reasonably good parameter estimates were obtained, although heterozygote penetrances were often overestimated. Different sampling schemes were imposed on the simulated data in order to examine the extent of information loss with the reduction in sample size. One of these strategies (a sequential sampling scheme) appears to have resulted in critical loss of information in some cases.  相似文献   

11.
A critical decision in landscape genetic studies is whether to use individuals or populations as the sampling unit. This decision affects the time and cost of sampling and may affect ecological inference. We analyzed 334 Columbia spotted frogs at 8 microsatellite loci across 40 sites in northern Idaho to determine how inferences from landscape genetic analyses would vary with sampling design. At all sites, we compared a proportion available sampling scheme (PASS), in which all samples were used, to resampled datasets of 2–11 individuals. Additionally, we compared a population sampling scheme (PSS) to an individual sampling scheme (ISS) at 18 sites with sufficient sample size. We applied an information theoretic approach with both restricted maximum likelihood and maximum likelihood estimation to evaluate competing landscape resistance hypotheses. We found that PSS supported low‐density forest when restricted maximum likelihood was used, but a combination model of most variables when maximum likelihood was used. We also saw variations when AIC was used compared to BIC. ISS supported this model as well as additional models when testing hypotheses of land cover types that create the greatest resistance to gene flow for Columbia spotted frogs. Increased sampling density and study extent, seen by comparing PSS to PASS, showed a change in model support. As number of individuals increased, model support converged at 7–9 individuals for ISS to PSS. ISS may be useful to increase study extent and sampling density, but may lack power to provide strong support for the correct model with microsatellite datasets. Our results highlight the importance of additional research on sampling design effects on landscape genetics inference.  相似文献   

12.
The problem of ascertainment for linkage analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
It is generally believed that ascertainment corrections are unnecessary in linkage analysis, provided individuals are selected for study solely on the basis of trait phenotype and not on the basis of marker genotype. The theoretical rationale for this is that standard linkage analytic methods involve conditioning likelihoods on all the trait data, which may be viewed as an application of the ascertainment assumption-free (AAF) method of Ewens and Shute. In this paper, we show that when the observed pedigree structure depends on which relatives within a pedigree happen to have been the probands (proband-dependent, or PD, sampling) conditioning on all the trait data is not a valid application of the AAF method and will result in asymptotically biased estimates of genetic parameters (except under single ascertainment). Furthermore, this result holds even if the recombination fraction R is the only parameter of interest. Since the lod score is proportional to the likelihood of the marker data conditional on all the trait data, this means that when data are obtained under PD sampling the lod score will yield asymptotically biased estimates of R, and that so-called mod scores (i.e., lod scores maximized over both R and parameters theta of the trait distribution) will yield asymptotically biased estimates of R and theta. Furthermore, the problem appears to be intractable, in the sense that it is not possible to formulate the correct likelihood conditional on observed pedigree structure. In this paper we do not investigate the numerical magnitude of the bias, which may be small in many situations. On the other hand, virtually all linkage data sets are collected under PD sampling. Thus, the existence of this bias will be the rule rather than the exception in the usual applications.  相似文献   

13.
A number of procedures have been developed that allow the genetic parameters of natural populations to be estimated using relationship information inferred from marker data rather than known pedigrees. Three published approaches are available; the regression, pair‐wise likelihood and Markov Chain Monte Carlo (MCMC) sib‐ship reconstruction methods. These were applied to body weight and molecular data collected from the Soay sheep population of St. Kilda, which has a previously determined pedigree. The regression and pair‐wise likelihood approaches do not specify an exact pedigree and yielded unreliable heritability estimates, that were sensitive to alteration of the fixed effects. The MCMC method, which specifies a pedigree prior to heritability estimation, yielded results closer to those determined using the known pedigree. In populations of low average relationship, such as the Soay sheep population, determination of a reliable pedigree is more useful than indirect approaches that do not specify a pedigree.  相似文献   

14.
E A Thompson  R G Shaw 《Biometrics》1990,46(2):399-413
Recent developments in the animal breeding literature facilitate estimation of the variance components in quantitative genetic models. However, computation remains intensive, and many of the procedures are restricted to specialized designs and models, unsuited to data arising from studies of natural populations. We develop algorithms that allow maximum likelihood estimation of variance components for data on arbitrary pedigree structures. The proposed methods can be implemented on microcomputers, since no intensive matrix computations or manipulations are involved. Although parts of our procedures have been previously presented, we unify these into an overall scheme whose intuitive justification clarifies the approach. Two examples are analyzed: one of data on a natural population of Salivia lyrata and the other of simulated data on an extended pedigree.  相似文献   

15.
The fine-scale spatial genetic structure (SGS) of alpine plants is receiving increasing attention, from which seed and pollen dispersal can be inferred. However, estimation of SGS may depend strongly on the sampling strategy,including the sample size and spatial sampling scheme. Here, we examined the effects of sample size and three spatial schemes, simple-random, line-transect, and random-cluster sampling, on the estimation of SGS in Androsace tapete, an alpine cushion plant endemic to Qinghai-Tibetan Plateau. Using both real data and simulated data of dominant molecular markers, we show that: (i) SGS is highly sensitive to sample strategy especially when the sample size is small (e.g., below 100); (ii) the commonly used SGS parameter (the intercept of the autocorrelogram) is more susceptible to sample error than a newly developed Sp statistic; and (iii) the random-cluster scheme is susceptible to obvious bias in parameter estimation even when the sample size is relatively large (e.g., above 200). Overall,the line-transect scheme is recommendable, in that it performs slightly better than the simple-random scheme in parameter estimation and is more efficient to encompass broad spatial scales. The consistency between simulated data and real data implies that these findings might hold true in other alpine plants and more species should be examined in future work.  相似文献   

16.
A new method for segregation and linkage analysis, with pedigree data, is described. Reversible jump Markov chain Monte Carlo methods are used to implement a sampling scheme in which the Markov chain can jump between parameter subspaces corresponding to models with different numbers of quantitative-trait loci (QTL's). Joint estimation of QTL number, position, and effects is possible, avoiding the problems that can arise from misspecification of the number of QTL's in a linkage analysis. The method is illustrated by use of a data set simulated for the 9th Genetic Analysis Workshop; this data set had several oligogenic traits, generated by use of a 1,497-member pedigree. The mixing characteristics of the method appear to be good, and the method correctly recovers the simulated model from the test data set. The approach appears to have great potential both for robust linkage analysis and for the answering of more general questions regarding the genetic control of complex traits.  相似文献   

17.
Abstract The fine‐scale spatial genetic structure (SGS) of alpine plants is receiving increasing attention, from which seed and pollen dispersal can be inferred. However, estimation of SGS may depend strongly on the sampling strategy, including the sample size and spatial sampling scheme. Here, we examined the effects of sample size and three spatial schemes, simple‐random, line‐transect, and random‐cluster sampling, on the estimation of SGS in Androsace tapete, an alpine cushion plant endemic to Qinghai‐Tibetan Plateau. Using both real data and simulated data of dominant molecular markers, we show that: (i) SGS is highly sensitive to sample strategy especially when the sample size is small (e.g., below 100); (ii) the commonly used SGS parameter (the intercept of the autocorrelogram) is more susceptible to sample error than a newly developed Sp statistic; and (iii) the random‐cluster scheme is susceptible to obvious bias in parameter estimation even when the sample size is relatively large (e.g., above 200). Overall, the line‐transect scheme is recommendable, in that it performs slightly better than the simple‐random scheme in parameter estimation and is more efficient to encompass broad spatial scales. The consistency between simulated data and real data implies that these findings might hold true in other alpine plants and more species should be examined in future work.  相似文献   

18.
Optimal experiment design for parameter estimation (OED/PE) has become a popular tool for efficient and accurate estimation of kinetic model parameters. When the kinetic model under study encloses multiple parameters, different optimization strategies can be constructed. The most straightforward approach is to estimate all parameters simultaneously from one optimal experiment (single OED/PE strategy). However, due to the complexity of the optimization problem or the stringent limitations on the system's dynamics, the experimental information can be limited and parameter estimation convergence problems can arise. As an alternative, we propose to reduce the optimization problem to a series of two-parameter estimation problems, i.e., an optimal experiment is designed for a combination of two parameters while presuming the other parameters known. Two different approaches can be followed: (i) all two-parameter optimal experiments are designed based on identical initial parameter estimates and parameters are estimated simultaneously from all resulting experimental data (global OED/PE strategy), and (ii) optimal experiments are calculated and implemented sequentially whereby the parameter values are updated intermediately (sequential OED/PE strategy).This work exploits OED/PE for the identification of the Cardinal Temperature Model with Inflection (CTMI) (Rosso et al., 1993). This kinetic model describes the effect of temperature on the microbial growth rate and encloses four parameters. The three OED/PE strategies are considered and the impact of the OED/PE design strategy on the accuracy of the CTMI parameter estimation is evaluated. Based on a simulation study, it is observed that the parameter values derived from the sequential approach deviate more from the true parameters than the single and global strategy estimates. The single and global OED/PE strategies are further compared based on experimental data obtained from design implementation in a bioreactor. Comparable estimates are obtained, but global OED/PE estimates are, in general, more accurate and reliable.  相似文献   

19.
Meyer K 《Heredity》2008,101(3):212-221
Mixed model analyses via restricted maximum likelihood, fitting the so-called animal model, have become standard methodology for the estimation of genetic variances. Models involving multiple genetic variance components, due to different modes of gene action, are readily fitted. It is shown that likelihood-based calculations may provide insight into the quality of the resulting parameter estimates, and are directly applicable to the validation of experimental designs. This is illustrated for the example of a design suggested recently to estimate X-linked genetic variances. In particular, large sample variances and sampling correlations are demonstrated to provide an indication of 'problem' scenarios. Using simulation, it is shown that the profile likelihood function provides more appropriate estimates of confidence intervals than large sample variances. Examination of the likelihood function and its derivatives are recommended as part of the design stage of quantitative genetic experiments.  相似文献   

20.
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号