首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Extreme discordant sibling pairs (EDSPs) are theoretically powerful for the mapping of quantitative-trait loci (QTLs) in humans. EDSPs have not been used much in practice, however, because of the need to screen very large populations to find enough pairs that are extreme and discordant. Given appropriate statistical methods, another alternative is to use moderately discordant sibling pairs (MDSPs)--pairs that are discordant but not at the far extremes of the distribution. Such pairs can be powerful yet far easier to collect than extreme discordant pairs. Recent work on statistical methods for QTL mapping in humans has included a number of methods that, though not developed specifically for discordant pairs, may well be powerful for MDSPs and possibly even EDSPs. In the present article, we survey the new statistics and discuss their applicability to discordant pairs. We then use simulation to study the type I error and the power of various statistics for EDSPs and for MDSPs. We conclude that the best statistic(s) for discordant pairs (moderate or extreme) is (are) to be found among the new statistics. We suggest that the new statistics are appropriate for many other designs as well-and that, in fact, they open the way for the exploration of entirely novel designs.  相似文献   

2.
3.
Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.  相似文献   

4.
Polygenic scores have recently been used to summarise genetic effects among an ensemble of markers that do not individually achieve significance in a large-scale association study. Markers are selected using an initial training sample and used to construct a score in an independent replication sample by forming the weighted sum of associated alleles within each subject. Association between a trait and this composite score implies that a genetic signal is present among the selected markers, and the score can then be used for prediction of individual trait values. This approach has been used to obtain evidence of a genetic effect when no single markers are significant, to establish a common genetic basis for related disorders, and to construct risk prediction models. In some cases, however, the desired association or prediction has not been achieved. Here, the power and predictive accuracy of a polygenic score are derived from a quantitative genetics model as a function of the sizes of the two samples, explained genetic variance, selection thresholds for including a marker in the score, and methods for weighting effect sizes in the score. Expressions are derived for quantitative and discrete traits, the latter allowing for case/control sampling. A novel approach to estimating the variance explained by a marker panel is also proposed. It is shown that published studies with significant association of polygenic scores have been well powered, whereas those with negative results can be explained by low sample size. It is also shown that useful levels of prediction may only be approached when predictors are estimated from very large samples, up to an order of magnitude greater than currently available. Therefore, polygenic scores currently have more utility for association testing than predicting complex traits, but prediction will become more feasible as sample sizes continue to grow.  相似文献   

5.
The study of genetic linkage or association in complex traits requires large sample sizes, as the expected effect sizes are small and extremely low significance levels need to be adopted. One possible way to reduce the numbers of phenotypings and genotypings is the use of a sequential study design. Here, average sample sizes are decreased by conducting interim analyses with the possibility to stop the investigation early if the result is significant. We applied optimized group sequential study designs to the analysis of genetic linkage (one-sided mean test) and association (two-sided transmission/disequilibrium test). For designs with two and three stages at overall significance levels of.05 and.0001 and a power of.8, we calculated necessary sample sizes, time points, and critical boundaries for interim and final analyses. Monte Carlo simulation analyses were performed to confirm the validity of the asymptotic approximation. Furthermore, we calculated average sample sizes required under the null and alternative hypotheses in the different study designs. It was shown that the application of a group sequential design led to a maximal increase in sample size of 8% under the null hypothesis, compared with the fixed-sample design. This was contrasted by savings of up to 20% in average sample sizes under the alternative hypothesis, depending on the applied design. These savings affect the amounts of genotyping and phenotyping required for a study and therefore lead to a significant decrease in cost and time.  相似文献   

6.
We have compared the power of several allele-sharing statistics for "nonparametric" linkage analysis of X-linked traits in nuclear families and extended pedigrees. Our rationale was that, although several of these statistics have been implemented in popular software packages, there has been no formal evaluation of their relative power. Here, we evaluate the relative performance of five test statistics, including two new test statistics. We considered sibships of sizes two through four, four different extended pedigrees, 15 different genetic models (12 single-locus models and 3 two-locus models), and varying recombination fractions between the marker and the trait locus. We analytically estimated the sample sizes required for 80% power at a significance level of.001 and also used simulation methods to estimate power for a sample size of 10 families. We tried to identify statistics whose power was robust over a wide variety of models, with the idea that such statistics would be particularly useful for detection of X-linked loci associated with complex traits. We found that a commonly used statistic, S(all), generally performed well under various conditions and had close to the optimal sample sizes in most cases but that there were certain cases in which it performed quite poorly. Our two new statistics did not perform any better than those already in the literature. We also note that, under dominant and additive models, regardless of the statistic used, pedigrees with all-female siblings have very little power to detect X-linked loci.  相似文献   

7.
Segregation analysis, employing nuclear families, is the most frequently used method to evaluate the mode of inheritance of a trait. To our knowledge, there exists no tabular information regarding the sample sizes required of individuals and families needed to perform a significance test of a specific segregation ratio for a predetermined power and significance level. To fill this gap, we have developed sample-size tables based on the asymptotic variance of the maximum likelihood estimate of the segregation ratio and on the normal approximation for two-sided hypothesis testing. Assuming homogeneous sibship size, minimum sample sizes were determined for testing the null hypothesis for the segregation ratio of 1/4 or 1/2 vs. alternative values of .05-.80, for the significance level of .05 and power of .8, for ascertainment probabilities of nearly 0 to 1.0, and sibship sizes 2-7. The results of these calculations indicate a complex interaction of the null and the alternate hypotheses, ascertainment probability, and sibship size in determining the sample size required for simple segregation analysis. The accompanying tables should aid in the appropriate design and cost assessment of future genetic epidemiologic studies.  相似文献   

8.
Luo ZW  Wu CI 《Genetics》2001,158(4):1785-1800
Linkage disequilibrium is an important topic in evolutionary and population genetics. An issue yet to be settled is the theory required to extend the linkage disequilibrium analysis to complex traits. In this study, we present theoretical analysis and methods for detecting or estimating linkage disequilibrium (LD) between a polymorphic marker locus and any one of the loci affecting a complex dichotomous trait on the basis of samples randomly or selectively collected from natural populations. Statistical properties of these methods were investigated and their powers were compared analytically or by use of Monte Carlo simulations. The results show that the disequilibrium may be detected with a power of 80% by using phenotypic records and marker genotype when both the trait and marker variants are common (30%) and the LD is relatively high (40-100% of the theoretical maximum). The maximum-likelihood approach provides accurate estimates of the model parameters as well as detection of linkage disequilibrium. The likelihood method is preferred for its higher power and reliability in parameter estimation. The approaches developed in this article are also compared to those for analyzing a continuously distributed quantitative trait. It is shown that a larger sample size is required for the dichotomous trait model to obtain the same level of power in detecting linkage disequilibrium as the continuous trait analysis. Potential use of these estimates in mapping the trait locus is also discussed.  相似文献   

9.
Selection strategies for linkage studies using twins.   总被引:1,自引:0,他引:1  
Genetic linkage analysis for complex diseases offers a major challenge to geneticists. In these complex diseases multiple genetic loci are responsible for the disease and they may vary in the size of their contribution; the effect of any single one of them is likely to be small. In many situations, like in extensive twin registries, trait values have been recorded for a large number of individuals, and preliminary studies have revealed summary measures for those traits, like mean, variance and components of variance, including heritability. Given the small effect size, a random sample of twins will require a prohibitively large sample size. It is well known that selective sampling is far more efficient in terms of genotyping effort. In this paper we derive easy expressions for the information contributed by sib pairs for the detection of linkage to a quantitative trait locus (QTL). We consider random samples as well as samples of sib pairs selected on the basis of their trait values. These expressions can be rapidly computed and do not involve simulation. We extend our results for quantitative traits to dichotomous traits using the concept of a liability threshold model. We present tables with required sample sizes for height, insulin levels and migraine, three of the traits studied in the GenomEUtwin project.  相似文献   

10.
The internal pilot study design enables to estimate nuisance parameters required for sample size calculation on the basis of data accumulated in an ongoing trial. By this, misspecifications made when determining the sample size in the planning phase can be corrected employing updated knowledge. According to regulatory guidelines, blindness of all personnel involved in the trial has to be preserved and the specified type I error rate has to be controlled when the internal pilot study design is applied. Especially in the late phase of drug development, most clinical studies are run in more than one centre. In these multicentre trials, one may have to deal with an unequal distribution of the patient numbers among the centres. Depending on the type of the analysis (weighted or unweighted), unequal centre sample sizes may lead to a substantial loss of power. Like the variance, the magnitude of imbalance is difficult to predict in the planning phase. We propose a blinded sample size recalculation procedure for the internal pilot study design in multicentre trials with normally distributed outcome and two balanced treatment groups that are analysed applying the weighted or the unweighted approach. The method addresses both uncertainty with respect to the variance of the endpoint and the extent of disparity of the centre sample sizes. The actual type I error rate as well as the expected power and sample size of the procedure is investigated in simulation studies. For the weighted analysis as well as for the unweighted analysis, the maximal type I error rate was not or only minimally exceeded. Furthermore, application of the proposed procedure led to an expected power that achieves the specified value in many cases and is throughout very close to it.  相似文献   

11.
数量性状的遗传分析可以通过"选择基因型"的方式完成。本文提出了一个利用极端样本来对数量性状位点(QTL)进行关联分析的统计量T。统计量T比较上极端群体样本中具有纯合子标记的性状值差异。通过计算机模拟考察了无关联情形时T的分布和Ⅰ型错误率,结果表明,在各种样本选择策略下,T的分布近似于χ^2-分布,Ⅰ型错误率接近设定的显著性水平。同时,考察了各种遗传模型下不同遗传率,不同样本大小,及不同样本选择阈值对T的统计功效的影响,结果表明,T的功效随着标记和QTL间连锁不平衡程度的增强及遗传率和样本大小的增大而增大,当样本选择阈值更严格时,功效也越大。  相似文献   

12.
We are concerned here with practical issues in the application of extreme sib-pair (ESP) methods to quantitative traits. Two important factors-namely, the way extreme trait values are defined and the proportions in which different types of ESPs are pooled, in the analysis-are shown to determine the power and the cost effectiveness of a study design. We found that, in general, combining reasonable numbers of both extremely discordant and extremely concordant sib pairs that were available in the sample is more powerful and more cost effective than pursuing only a single type of ESP. We also found that dividing trait values with a less extreme threshold at one end or at both ends of the trait distribution leads to more cost-effective designs. The notion of generalized relative risk ratios (the lambda methods, as described in the first part of this series of two articles) is used to calculate the power and sample size for various choices of polychotomization of trait values and for the combination of different types of ESPs. A balance then can be struck among these choices, to attain an optimum design.  相似文献   

13.
We have compared the power of a large number of allele-sharing statistics for "nonparametric" linkage analysis with affected sibships. Our rationale was that there is an extensive literature comparing statistics for sibling pairs but that there has not been much guidance on how to choose statistics for studies that include sibships of various sizes. We concentrated on statistics that can be described as assigning scores to each identity-by-descent-sharing configuration that a pedigree might take on (Whittemore and Halpern 1994). We considered sibships of sizes two through five, 27 different genetic models, and varying recombination fractions between the marker and the trait locus. We tried to identify statistics whose power was robust over a wide variety of models. We found that the statistic that is probably used most often in such studies-S(all)-performs quite well, although it is not necessarily the best. We also found several other statistics (such as the R criterion, S(robdom), and the Sobel-and-Lange statistic C) that perform well in most situations, a few (such as S(-#geno) and the Feingold-and-Siegmund version of S(pairs)) that have high power only in very special situations, and a few (such as S(-#geno), the N criterion, and the Sobel-and-Lange statistic B) that seem to have low power for the majority of the trait models. For the most part, the same statistics performed well for all sibship sizes. We also used our results to give some suggestions regarding how to weight sibships of different sizes, in forming an overall statistic.  相似文献   

14.
Body size is a classic quantitative trait with evolutionarily significant variation within many species. Locating the alleles responsible for this variation would help understand the maintenance of variation in body size in particular, as well as quantitative traits in general. However, successful genome-wide association of genotype and phenotype may require very large sample sizes if alleles have low population frequencies or modest effects. As a complementary approach, we propose that population-based resequencing of experimentally evolved populations allows for considerable power to map functional variation. Here, we use this technique to investigate the genetic basis of natural variation in body size in Drosophila melanogaster. Significant differentiation of hundreds of loci in replicate selection populations supports the hypothesis that the genetic basis of body size variation is very polygenic in D. melanogaster. Significantly differentiated variants are limited to single genes at some loci, allowing precise hypotheses to be formed regarding causal polymorphisms, while other significant regions are large and contain many genes. By using significantly associated polymorphisms as a priori candidates in follow-up studies, these data are expected to provide considerable power to determine the genetic basis of natural variation in body size.  相似文献   

15.
Experimental error control is very important in quantitative trait locus (QTL) mapping. Although numerous statistical methods have been developed for QTL mapping, a QTL detection model based on an appropriate experimental design that emphasizes error control has not been developed. Lattice design is very suitable for experiments with large sample sizes, which is usually required for accurate mapping of quantitative traits. However, the lack of a QTL mapping method based on lattice design dictates that the arithmetic mean or adjusted mean of each line of observations in the lattice design had to be used as a response variable, resulting in low QTL detection power. As an improvement, we developed a QTL mapping method termed composite interval mapping based on lattice design (CIMLD). In the lattice design, experimental errors are decomposed into random errors and block-within-replication errors. Four levels of block-within-replication errors were simulated to show the power of QTL detection under different error controls. The simulation results showed that the arithmetic mean method, which is equivalent to a method under random complete block design (RCBD), was very sensitive to the size of the block variance and with the increase of block variance, the power of QTL detection decreased from 51.3% to 9.4%. In contrast to the RCBD method, the power of CIMLD and the adjusted mean method did not change for different block variances. The CIMLD method showed 1.2- to 7.6-fold higher power of QTL detection than the arithmetic or adjusted mean methods. Our proposed method was applied to real soybean (Glycine max) data as an example and 10 QTLs for biomass were identified that explained 65.87% of the phenotypic variation, while only three and two QTLs were identified by arithmetic and adjusted mean methods, respectively.  相似文献   

16.
Fan R  Floros J  Xiong M 《Human heredity》2002,53(3):130-145
In this paper, we explore models and tests for association and linkage studies of a quantitative trait locus (QTL) linked to a multi-allele marker locus. Based on the difference between an offspring's conditional trait means of receiving and not receiving an allele from a parent at marker locus, we propose three statistics T(m), T(m,row) and T(m,col) to test association or linkage disequilibrium between the marker locus and the QTL. These tests are composite tests, and use the offspring marginal sample means including offspring data of both homozygous and heterozygous parents. For the linkage study, we calculate the offspring's conditional trait mean given the allele transmission status of a heterozygous parent at the marker locus. Based on the difference between the conditional means of a transmitted and a nontransmitted allele from a heterozygous parent, we propose statistics T(parsi), T(satur), T(gen) and T(m,het) to perform composite tests of linkage between the marker locus and the quantitative trait locus in the presence of association. These tests only use the offspring data that are related to the heterozygous parents at the marker locus. T(parsi) is a parsimonious or allele-wise statistic, T(satur) and T(gen )are satured or genotype-wise statistics, and T(m,het) compares the row and column sample means for offspring data of heterozygous parents. After comparing the powers and the sample sizes, we conclude that T(parsi) has higher power than those of the bi-allele tests, T(satur), T(gen), and T(m,het). If there is tight linkage between the marker and the trait locus, T(parsi) is powerful in detecting linkage between the marker and the trait locus in the presence of association. By investigating the goodness-of-fit of T(parsi), we find that T(satur) does not gain much power compared to that of T(parsi). Moreover, T(parsi) takes into account the pattern of the data that is consistent with linkage and linkage disequilibrium. As the number of alleles at the marker locus increases, T(parsi) is very conservative, and can be useful even for sparse data. To illustrate the usefulness and the power of the methods proposed in this paper, we analyze the chromosome 6 data of the Oxford asthma data, Genetic Analysis Workshop 12.  相似文献   

17.
Ball RD 《Genetics》2005,170(2):859-873
A method is given for design of experiments to detect associations (linkage disequilibrium) in a random population between a marker and a quantitative trait locus (QTL), or gene, with a given strength of evidence, as defined by the Bayes factor. Using a version of the Bayes factor that can be linked to the value of an F-statistic with an existing deterministic power calculation makes it possible to rapidly evaluate a comprehensive range of scenarios, demonstrating the feasibility, or otherwise, of detecting genes of small effect. The Bayes factor is advocated for use in determining optimal strategies for selecting candidate genes for further testing or applications. The prospects for fine-scale mapping of QTL are reevaluated in this framework. We show that large sample sizes are needed to detect small-effect genes with a respectable-sized Bayes factor, and to have good power to detect a QTL allele at low frequency it is necessary to have a marker with similar allele frequency near the gene.  相似文献   

18.
Y Cui  F Zhang  J Xu  Z Li  S Xu 《Heredity》2015,115(6):538-546
Quantitative trait locus (QTL) mapping is often conducted in line-crossing experiments where a sample of individuals is randomly selected from a pool of all potential progeny. QTLs detected from such an experiment are important for us to understand the genetic mechanisms governing a complex trait, but may not be directly relevant to plant breeding if they are not detected from the breeding population where selection is targeting for. QTLs segregating in one population may not necessarily segregate in another population. To facilitate marker-assisted selection, QTLs must be detected from the very population which the selection is targeting. However, selected breeding populations often have depleted genetic variation with small population sizes, resulting in low power in detecting useful QTLs. On the other hand, if selection is effective, loci controlling the selected trait will deviate from the expected Mendelian segregation ratio. In this study, we proposed to detect QTLs in selected breeding populations via the detection of marker segregation distortion in either a single population or multiple populations using the same selection scheme. Simulation studies showed that QTL can be detected in strong selected populations with selected population sizes as small as 25 plants. We applied the new method to detect QTLs in two breeding populations of rice selected for high grain yield. Seven QTLs were identified, four of which have been validated in advanced generations in a follow-up study. Cloned genes in the vicinity of the four QTLs were also reported in the literatures. This mapping-by-selection approach provides a new avenue for breeders to improve breeding progress. The new method can be applied to breeding programs not only in rice but also in other agricultural species including crops, trees and animals.  相似文献   

19.
OBJECTIVES: Some traits, while naturally polychotomous, are routinely dichotomized for genetic analysis. Dichotomization, intuitively, leads to a loss of power to detect linkage, as some phenotypic variability is discarded. This paper examines this power loss in the context of a trichotomous trait. METHODS: To examine this power loss, we performed a simulation study where a trichotomous trait was simulated in a sample of 1,000 sib-pairs under various genetic models. The study was replicated 1,000 times. Linkage analysis using a variance components method, as implemented in Mx, was then performed on the trichotomous trait and compared with that on a dichotomized version of the trait. RESULTS: A comparison of the power and false positive rates of the analyses shows that power to detect linkage was increased by up to 22 percentage points simply by examining the trait as a trichotomy instead of a dichotomy. Under all models examined, the trichotomous analysis outperformed the dichotomous version. CONCLUSIONS: Comparable levels of false positive rates under both methods confirm that this power gain comes solely from the information lost upon dichotomization. Thus, dichotomizing tri- or poly-chotomous traits can lead to crippling power loss, especially in the case of many loci of small effect.  相似文献   

20.
Standard sample size calculation formulas for stepped wedge cluster randomized trials (SW-CRTs) assume that cluster sizes are equal. When cluster sizes vary substantially, ignoring this variation may lead to an under-powered study. We investigate the relative efficiency of a SW-CRT with varying cluster sizes to equal cluster sizes, and derive variance estimators for the intervention effect that account for this variation under a mixed effects model—a commonly used approach for analyzing data from cluster randomized trials. When cluster sizes vary, the power of a SW-CRT depends on the order in which clusters receive the intervention, which is determined through randomization. We first derive a variance formula that corresponds to any particular realization of the randomized sequence and propose efficient algorithms to identify upper and lower bounds of the power. We then obtain an “expected” power based on a first-order approximation to the variance formula, where the expectation is taken with respect to all possible randomization sequences. Finally, we provide a variance formula for more general settings where only the cluster size arithmetic mean and coefficient of variation, instead of exact cluster sizes, are known in the design stage. We evaluate our methods through simulations and illustrate that the average power of a SW-CRT decreases as the variation in cluster sizes increases, and the impact is largest when the number of clusters is small.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号