首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
One of the first and most important steps in planning a genetic association study is the accurate estimation of the statistical power under a proposed study design and sample size. In association studies for candidate genes or in fine-mapping applications, allele and genotype frequencies are often assumed to be known when, in fact, they are unknown (i.e., random variables from some distribution). For example, if we consider a diallelic marker with allele frequencies of 0.5 and 0.5 and Hardy-Weinberg proportions, the three genotype frequencies are often assumed to be 0.25, 0.50, and 0.25, and the statistical power is calculated. Unfortunately, ignoring this source of variation can inflate the estimated power of the study. In the present article, we propose averaging the estimates of power over the distribution of the genotype frequencies to calculate the true estimate of power for a fixed allele frequency. For the usual situation, in which allele frequencies in a population are not known, we propose placing a prior distribution on the allele frequency, taking advantage of any available genotype information. This Bayesian approach provides a more accurate estimate of power. We present examples for quantitative and qualitative traits in cohort studies of unrelated individuals and results from an extensive series of examples that show that ignoring the uncertainty in allele frequencies can inflate the estimated power of the study. We also present the results from case-control studies and show that standard methods may also overestimate power. As discussed in this article, the approach of fixing allele frequencies even if they are not known is the common approach to power calculations. We show that ignoring the sources of variation in allele frequencies tends to result in overestimates of power and, consequently, in studies that are underpowered. Software in C is available at http://www.ambrosius.net/Power/.  相似文献   

2.
Case-control disease-marker association studies are often used in the search for variants that predispose to complex diseases. One approach to increasing the power of these studies is to enrich the case sample for individuals likely to be affected because of genetic factors. In this article, we compare three case-selection strategies that use allele-sharing information with the standard strategy that selects a single individual from each family at random. In affected sibship samples, we show that, by carefully selecting sibships and/or individuals on the basis of allele sharing, we can increase the frequency of disease-associated alleles in the case sample. When these cases are compared with unrelated controls, the difference in the frequency of the disease-associated allele is therefore also increased. We find that, by choosing the affected sib who shows the most evidence for pairwise allele sharing with the other affected sibs in families, the test statistic is increased by >20%, on average, for additive models with modest genotype relative risks. In addition, we find that the per-genotype information associated with the allele sharing-based strategies is increased compared with that associated with random selection of a sib for genotyping. Even though we select sibs on the basis of a nonparametric statistic, the additional gain for selection based on the unknown underlying mode of inheritance is minimal. We show that these properties hold even when the power to detect linkage to a region in the entire sample is negligible. This approach can be extended to more-general pedigree structures and quantitative traits.  相似文献   

3.
Microsatellite null alleles and estimation of population differentiation   总被引:20,自引:0,他引:20  
Microsatellite null alleles are commonly encountered in population genetics studies, yet little is known about their impact on the estimation of population differentiation. Computer simulations based on the coalescent were used to investigate the evolutionary dynamics of null alleles, their impact on F(ST) and genetic distances, and the efficiency of estimators of null allele frequency. Further, we explored how the existing method for correcting genotype data for null alleles performed in estimating F(ST) and genetic distances, and we compared this method with a new method proposed here (for F(ST) only). Null alleles were likely to be encountered in populations with a large effective size, with an unusually high mutation rate in the flanking regions, and that have diverged from the population from which the cloned allele state was drawn and the primers designed. When populations were significantly differentiated, F(ST) and genetic distances were overestimated in the presence of null alleles. Frequency of null alleles was estimated precisely with the algorithm presented in Dempster et al. (1977). The conventional method for correcting genotype data for null alleles did not provide an accurate estimate of F(ST) and genetic distances. However, the use of the genetic distance of Cavalli-Sforza and Edwards (1967) corrected by the conventional method gave better estimates than those obtained without correction. F(ST) estimation from corrected genotype frequencies performed well when restricted to visible allele sizes. Both the proposed method and the traditional correction method have been implemented in a program that is available free of charge at http://www.montpellier.inra.fr/URLB/. We used 2 published microsatellite data sets based on original and redesigned pairs of primers to empirically confirm our simulation results.  相似文献   

4.
Kang SJ  Finch SJ  Haynes C  Gordon D 《Human heredity》2004,58(3-4):139-144
Kang et al. [Genet Epidemiol 2004;26:132-141] addressed the question of which genotype misclassification errors are most costly, in terms of minimum percentage increase in sample size necessary (%MSSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association in a genetic model-free setting. They answered the question for single nucleotide polymorphisms (SNPs) using the 2 x 3 chi2 test of independence. We address the same question here for a genetic model-based framework. The genetic model parameters considered are: disease model (dominant, recessive), genotypic relative risk, SNP (marker) and disease allele frequency, and linkage disequilibrium. %MSSN coefficients of each of the six possible error rates are determined by expanding the non-centrality parameter of the asymptotic distribution of the 2 x 3 chi2 test under a specified alternative hypothesis to approximate %MSSN using a linear Taylor series in the error rates. In this work we assume errors misclassifying one homozygote as another homozygote are 0, since these errors are thought to rarely occur in practice. Our findings are that there are settings of the genetic model parameters that lead to large total %MSSN for both dominant and recessive models. As SNP minor allele approaches 0, total %MSSN increases without bound, independent of other genetic model parameters. In general, %MSSN is a complex function of the genetic model parameters. Use of SNPs with small minor allele frequency requires careful attention to frequency of genotyping errors to insure that power specifications are met. Software to perform these calculations for study design is available, and an example of its use to study a disease is given.  相似文献   

5.
Summary In the south-east of France, local honey bees possess only the B allele at the MDH locus, whereas the races which are usually imported into this area do not have this allele. The proportion of non-B genes in a sample of drones was used to measure the genetic pollution in the local population. Within the course of a breeding scheme of local bees, 99 queens, whose genotypes are BB, were naturally mated between April 25 and June 10, 1985 at la Tave (Gard, France). Twenty daughters-workers of each queen were analysed at the MDH locus. The frequency of the B allele in drones that mated with these queens is estimated by the proportion of workers with genotype BB and the genetic pollution by the cumulated frequency of the other alleles. The sampling variances of these frequencies involve a coefficient which is a function of the average number of drones mated with a queen. This latter parameter is estimated through the maximum likelihood method. In addition to the three well-known alleles, a rare allele (frequency=0.0055), possibly equivalent to the S1 allele described by Badino et al. (1983), has been found in three different colonies. Cumulating the frequencies of the non-B alleles results in an estimation of the genetic pollution equal to 0.0394 (±0.0071). This low value allows us to proceed to the next step of the selection project. The mean number of drones mated to a queen is 12.4 with a (10.4–19.3) confidence interval at the 90% level.  相似文献   

6.
Historically, most methods for detecting linkage disequilibrium were designed for use with diallelic marker loci, for which the analysis is straightforward. With the advent of polymorphic markers with many alleles, the normal approach to their analysis has been either to extend the methodology for two-allele systems (leading to an increase in df and to a corresponding loss of power) or to select the allele believed to be associated and then collapse the other alleles, reducing, in a biased way, the locus to a diallelic system. I propose a likelihood-based approach to testing for linkage disequilibrium, an approach that becomes more conservative as the number of alleles increases, and as the number of markers considered jointly increases in a multipoint test for linkage disequilibrium, while maintaining high power. Properties of this method for detecting associations and fine mapping the location of disease traits are investigated. It is found to be, in general, more powerful than conventional methods, and it provides a tractable framework for the fine mapping of new disease loci. Application to the cystic fibrosis data of Kerem et al, is included to illustrate the method.  相似文献   

7.
Wang ET  Moyzis RK 《Mutation research》2007,616(1-2):165-174
Using the 2.6 million single nucleotide polymorphism (SNP) genotype datasets from Perlegen Sciences and the Haplotype Map (HapMap) project (Phase I freeze), a probabilistic search for the landscape exhibited by positive Darwinian selection was conducted (Wang et al., 2006). By sorting each high frequency allele by homozygosity, we search for the expected decay of adjacent SNP linkage disequilibrium (LD) at recently selected alleles, eliminating the need for inferring haplotype. We designate this approach the LD decay (LDD) test. Cluster analysis indicates that approximately 3000 sites of recent inferred selection are present in human DNA, representing approximately 1800 genes. Prior simulation studies (Wang et al., 2006) indicate that this novel LDD test, at the Mb scale employed, effectively distinguishes selection from other causes of extensive LD, such as inversions, population bottlenecks and admixture. Based on over-representation analysis, these prior studies have shown that several predominant biological themes are common in inferred selected alleles, including genes involved with DNA metabolism and repair. Here, we show that three of these DNA repair genes, ERCC8, Fanconi Anemia Complementation Group C (FANCC), and RAD51C, exhibit genomic architectures consistent with ongoing balanced selection over the last 40,000-50,000 years.  相似文献   

8.
Fan R  Jung J 《Human heredity》2002,54(3):132-150
In this paper, we extend association study methods of both Fan et al. [Hum Hered 2002;53:130-145], in which a quantitative trait locus (QTL) and a multi-allele marker are considered for trio families, and Fan and Xiong [Biostatistics 2003, in press], in which a QTL and a bi-allelic marker are considered for nuclear families. The objective is to build mixed models for association study between a QTL and a multi-allelic marker for nuclear families with any number of offspring. Two types of nuclear family data are considered: the first is genetic data of offspring from at least one heterozygous parents, and the second is genetic data of offspring of nuclear family. (1) For the data of offspring from at least one heterozygous parents, we assume that at least one parent is heterozygous at the marker locus, and we may infer clearly the transmission of parental marker alleles to the offspring. We show that it can be used in association study in the presence of linkage. The theoretical basis is the difference between the conditional mean of trait value given an allele is transmitted and the conditional mean of trait value given the allele is not transmitted from a heterozygous parent. To build valid models, we calculate the variance covariance structure of trait values of offspring. Besides, the reduction of the number of parameters is discussed under an assumption of tight linkage between the trait locus and the marker. (2) For the data of offspring of nuclear family, we show that it can be used in general association study. In this case, the theoretical basis is the difference between the conditional mean of trait values given an allele is transmitted from a parent and the population mean. Then, we calculate variance-covariance structure of trait values of offspring. (3) Based on the theoretical analysis, mixed models are built for each type of the data, and related test statistics are proposed for association study. By power calculation and comparison, we show that, in some instances, the proposed test statistics have higher power than that by collapsing alleles to be new ones. The proposed models are used to analyze chromosomes 4 and chromosome 16 data of the Oxford asthma data, Genetic Analysis Workshop 12.  相似文献   

9.
Microsatellites are the most popular markers for parentage assignment and population genetic studies. To meet the demand for international comparability for genetic studies of Asian seabass, a standard panel of 28 microsatellites has been selected and characterized using the DNA of 24 individuals from Thailand, Malaysia, Indonesia and Australia. The average allele number of these markers was 10.82 ± 0.71 (range: 6–19), and the expected heterozygosity averaged 0.76 ± 0.02 (range: 0.63–1.00). All microsatellites showed Mendelian inheritance. In addition, eight standard size controls have been developed by cloning a set of microsatellite alleles into a pGEM‐T vector to calibrate allele sizes determined by different laboratories, and are available upon request. Seven multiplex PCRs, each amplifying 3–5 markers, were optimized to accurately and rapidly genotype microsatellites. Parentage assignment using 10 microsatellites in two crosses (10 × 10 and 20 × 20) demonstrated a high power of these markers for revealing parent‐sibling connections. This standard set of microsatellites will standardize genetic diversity studies of Asian seabass, and the multiplex PCR sets will facilitate parentage assignment.  相似文献   

10.
Genome-wide case–control studies have been widely used to identify genetic variants that predispose to human diseases. Such studies are powerful in detecting common genetic variants with moderate effects, but quickly lose power as allele frequency and genotype relative risk decrease. Because patients with one or more affected relatives are more likely to inherit disease-predisposing alleles of a genetic disease than patients without family histories of the disease, sampling patients with affected relatives almost always increases the frequency of disease predisposing alleles in cases and improves the power of case–control association studies. This paper evaluates the power of case–control studies that select cases and/or controls according to their family histories of disease. Our results showed that this study design can dramatically increase the power of a case–control association study for a wide range of disease types. Because each additional affected relative of a patient reduces the required sample size roughly by a pair of case and control, inclusion of cases with affected relatives can dramatically decrease the required sample size and thus the cost of such studies.  相似文献   

11.
The fixation of advantageous mutations in a population has the effect of reducing variation in the DNA sequence near that mutation. Kaplan et al. (1989) used a three-phase simulation model to study the effect of selective sweeps on genealogies. However, most subsequent work has simplified their approach by assuming that the number of individuals with the advantageous allele follows the logistic differential equation. We show that the impact of a selective sweep can be accurately approximated by a random partition created by a stick-breaking process. Our simulation results show that ignoring the randomness when the number of individuals with the advantageous allele is small can lead to substantial errors.  相似文献   

12.
M J Sobel  J Arnold  M Sobel 《Biometrics》1986,42(1):45-65
In previous work several models have been developed for genetic surveys of natural populations. Parents of unknown genotype are collected from a natural population, polymorphic at a single genetic locus. From each of these N cryptic parents a number of offspring are identified for their genotype. Our problem is to select an efficient offspring sampling plan for estimating the frequency of an allele in the cryptic adult population based on the N family profiles of juvenile genotypes. A criterion called the information per unit cost of observation is introduced to evaluate sequential sampling plans, in which the number of offspring per family examined is random. Some simple, practical schemes for stopping the sampling of offspring from a collected parent are introduced; one example is stopping when: (i) the offspring are definitive about the parental genotype(s) for the first time; (ii) a fixed number of one genotype only is seen; or (iii) a fixed maximum feasible number of offspring have been genotyped. This sampling scheme is recommended. For each sampling scheme, the best linear unbiased estimator and the sequential maximum likelihood estimator of the allele frequency are characterized. From the moments of these estimators, it is then possible to tabulate efficient sequential sampling plans, which are better (in the sense of information per unit cost), just as simple, and less costly than corresponding fixed sampling plans in use.  相似文献   

13.
A central focus of complex disease genetics after genome-wide association studies (GWAS) is to identify low frequency and rare risk variants, which may account for an important fraction of disease heritability unexplained by GWAS. A profusion of studies using next-generation sequencing are seeking such risk alleles. We describe how already-known complex trait loci (largely from GWAS) can be used to guide the design of these new studies by selecting cases, controls, or families who are most likely to harbor undiscovered risk alleles. We show that genetic risk prediction can select unrelated cases from large cohorts who are enriched for unknown risk factors, or multiply-affected families that are more likely to harbor high-penetrance risk alleles. We derive the frequency of an undiscovered risk allele in selected cases and controls, and show how this relates to the variance explained by the risk score, the disease prevalence and the population frequency of the risk allele. We also describe a new method for informing the design of sequencing studies using genetic risk prediction in large partially-genotyped families using an extension of the Inside-Outside algorithm for inference on trees. We explore several study design scenarios using both simulated and real data, and show that in many cases genetic risk prediction can provide significant increases in power to detect low-frequency and rare risk alleles. The same approach can also be used to aid discovery of non-genetic risk factors, suggesting possible future utility of genetic risk prediction in conventional epidemiology. Software implementing the methods in this paper is available in the R package Mangrove.  相似文献   

14.
The transmission/disequilibrium test (TDT) developed by Spielman et al. can be a powerful family-based test of linkage and, in some cases, a test of association as well as linkage. It has recently been extended in several ways; these include allowance for implementation with quantitative traits, allowance for multiple alleles, and, in the case of dichotomous traits, allowance for testing in the absence of parental data. In this article, these three extensions are combined, and two procedures are developed that offer valid joint tests of linkage and (in the case of certain sibling configurations) association with quantitative traits, with use of data from siblings only, and that can accommodate biallelic or multiallelic loci. The first procedure uses a mixed-effects (i.e., random and fixed effects) analysis of variance in which sibship is the random factor, marker genotype is the fixed factor, and the continuous phenotype is the dependent variable. Covariates can easily be accommodated, and the procedure can be implemented in commonly available statistical software. The second procedure is a permutation-based procedure. Selected power studies are conducted to illustrate the relative power of each test under a variety of circumstances.  相似文献   

15.
The inheritance of alleles of the transforming growth factor alpha (TGFA) locus has been studied in families affected with cleft lip with or without cleft palate (CL/P), by using the transmission/disequilibrium test described by Spielman and colleagues. Only heterozygous parents with an affected child can be included in this test, but within such families a significantly greater frequency of C2 alleles were transmitted to affected children than would be expected by chance. There was no evidence that the total number of C2 alleles transmitted to affected and unaffected children differed significantly from random segregation. These data provide evidence from within families that a gene for susceptibility to CL/P is in significant linkage disequilibrium with the C2 allele of the TGFA locus.  相似文献   

16.
OBJECTIVES: Modelling of variation in identical-by-descent (IBD) allele sharing using covariates can increase power to detect linkage, identify covariate-defined subgroups linked to particular marker regions, and improve the design of subsequent studies to localize genes and characterize their effects. In this report, we highlight issues that arise in studies of families with affected relatives. METHODS: Mirea et al. [Genet Epidemiol 2003, in press] extended linear and exponential linkage likelihood models [Kong and Cox, Am J Hum Genet 1997;61: 1179-1188] to model variation in NPL scores among covariate-defined groups of families, and proposed likelihood ratio (LR) and t statistics to detect differences in allele sharing between groups defined by a binary covariate. Here we evaluate factors affecting the power of these tests analytically and by example, as well as effects of constraints, nuisance parameters, and incomplete data on test validity by simulation of locus heterogeneity in families with affected siblings or affected cousins. RESULTS: Provided constraints on the parameters are avoided, these tests are particularly useful when one subgroup has less than expected IBD sharing. The distribution of the LR statistic depends on the extent of linkage, particularly in the presence of constraints. The t statistic may be biased by group differences in information content. CONCLUSIONS: We recommend that constraints be applied cautiously, and covariate effects in IBD allele sharing models interpreted with care.  相似文献   

17.
Thanks to genome‐scale diversity data, present‐day studies can provide a detailed view of how natural and cultivated species adapt to their environment and particularly to environmental gradients. However, due to their sensitivity, up‐to‐date studies might be more sensitive to undocumented demographic effects such as the pattern of migration and the reproduction regime. In this study, we provide guidelines for the use of popular or recently developed statistical methods to detect footprints of selection. We simulated 100 populations along a selective gradient and explored different migration models, sampling schemes and rates of self‐fertilization. We investigated the power and robustness of eight methods to detect loci potentially under selection: three designed to detect genotype–environment correlations and five designed to detect adaptive differentiation (based on FST or similar measures). We show that genotype–environment correlation methods have substantially more power to detect selection than differentiation‐based methods but that they generally suffer from high rates of false positives. This effect is exacerbated whenever allele frequencies are correlated, either between populations or within populations. Our results suggest that, when the underlying genetic structure of the data is unknown, a number of robust methods are preferable. Moreover, in the simulated scenario we used, sampling many populations led to better results than sampling many individuals per population. Finally, care should be taken when using methods to identify genotype–environment correlations without correcting for allele frequency autocorrelation because of the risk of spurious signals due to allele frequency correlations between populations.  相似文献   

18.
Summary The main purpose of germplasm banks is to preserve the genetic variability existing in crop species. The effectiveness of the regeneration of collections stored in gene banks is affected by factors such as sample size, random genetic drift, and seed viability. The objective of this paper is to review probability models and population genetics theory to determine the choice of sample size used for seed regeneration. A number of conclusions can be drawn from the results. First, the size of the sample depends largely on the frequency of the least common allele or genotype. Genotypes or alleles occurring at frequencies of more than 10% can be preserved with a sample size of 40 individuals. A sample size of 100 individuals will preserve genotypes (alleles) that occur at frequencies of 5%. If the frequency of rare genotypes (alleles) drops below 5%, larger sample sizes are required. A second conclusion is that for two, three, and four alleles per locus the sample size required to include a copy of each allele depends more on the frequency of the rare allele or alleles than on the number. Samples of 300 to 400 are required to preserve alleles that are present at a frequency of 1%. Third, if seed is bulked, the expected number of parents involved in any sample drawn from the bulk will be less than the number of parents included in the bulk. Fourth, to maintain a rate of breeding (F) of 1 %, the effective population size (N e) should be at least 150 for three alleles, and 300 for four alleles. Fifth, equalizing the reproductive output of each family to two progeny doubles the effective size of the population. Based on the results presented here, a practical option is considered for regenerating maize seed in a program constrained by limited funds.Part of this paper was presented at the Global Maize Germplasm Workshop, CIMMYT, El Batan, Mexico, March 6–12, 1988  相似文献   

19.
Using an enriched genomic library, we developed seven (CT)n/(GA)n microsatellite loci for eelgrass Zostera marina L. Enrichment is described and highly recommended for genomes in which microsatellites are rare, such as in many plants. A test for polymorphism was performed on individuals from three geographically separated populations (N = 15/population) and revealed considerable genetic variation. The number of alleles per locus varied between five and 11 and the observed heterozygosities for single loci ranged from 0.16 to 0.81 within populations. Mean allele lengths were markedly different among populations, indicating that the identified loci will be useful in studying population structure in Z. marina. As the frequency of the most abundant multilocus genotype within populations was always < 1%, these loci have sufficient resolving power to address clone size in predominantly vegetatively reproducing populations.  相似文献   

20.
Shimatani K  Takahashi M 《Heredity》2003,91(2):173-180
Spatial autocorrelation methods have commonly been applied to individual-based spatial genetic studies, although their properties and the relations among the statistics have not been carefully examined. This paper first introduces a reformulation of widely used spatial statistics using point processes. When Moran's I statistics are applied to allele frequencies within an individual, the frequencies are no longer continuous variables but have only three discrete values and specific interpretations of Moran's I statistics and the number of alleles in common (NAC) can be expressed as the weighted sum of join-count statistics. The distributions of minor genotypes are amplified in Moran's I depending on the allele frequency in the population, while NAC uses a constant weighting system. Under the point process framework, spatial analysis can be conducted on the common theoretical base, from individual locations to genetic distributions of different levels, (for example, genotype and allele). The methodology is demonstrated by application to field data for molecular ecological studies of Fagus crenata population dynamics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号