首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the first and most important steps in planning a genetic association study is the accurate estimation of the statistical power under a proposed study design and sample size. In association studies for candidate genes or in fine-mapping applications, allele and genotype frequencies are often assumed to be known when, in fact, they are unknown (i.e., random variables from some distribution). For example, if we consider a diallelic marker with allele frequencies of 0.5 and 0.5 and Hardy-Weinberg proportions, the three genotype frequencies are often assumed to be 0.25, 0.50, and 0.25, and the statistical power is calculated. Unfortunately, ignoring this source of variation can inflate the estimated power of the study. In the present article, we propose averaging the estimates of power over the distribution of the genotype frequencies to calculate the true estimate of power for a fixed allele frequency. For the usual situation, in which allele frequencies in a population are not known, we propose placing a prior distribution on the allele frequency, taking advantage of any available genotype information. This Bayesian approach provides a more accurate estimate of power. We present examples for quantitative and qualitative traits in cohort studies of unrelated individuals and results from an extensive series of examples that show that ignoring the uncertainty in allele frequencies can inflate the estimated power of the study. We also present the results from case-control studies and show that standard methods may also overestimate power. As discussed in this article, the approach of fixing allele frequencies even if they are not known is the common approach to power calculations. We show that ignoring the sources of variation in allele frequencies tends to result in overestimates of power and, consequently, in studies that are underpowered. Software in C is available at http://www.ambrosius.net/Power/.  相似文献   

2.
We show that single-nucleotide polymorphisms (SNPs) of moderate to high heterozygosity (minor allele frequencies >10%) can be efficiently detected, and their allele frequencies accurately estimated, by pooling the DNA samples and applying a capillary-based SSCP analysis. In this method, alleles are separated into peaks, and their frequencies can be reliably and accurately quantified from their peak heights (SD <1.8%). We found that as many as 40% of publicly available SNPs that were analyzed by this method have widely differing allele frequency distributions among groups of different ethnicity (parents of Centre d'Etude Polymorphisme Humaine families vs. Japanese individuals). These results demonstrate the effectiveness of the present pooling method in the reevaluation of candidate SNPs that have been collected by examination of limited numbers of individuals. The method should also serve as a robust quantitative technique for studies in which a precise estimate of SNP allele frequencies is essential-for example, in linkage disequilibrium analysis.  相似文献   

3.
Genomic selection (GS) is a DNA-based method of selecting for quantitative traits in animal and plant breeding, and offers a potentially superior alternative to traditional breeding methods that rely on pedigree and phenotype information. Using a 60 K SNP chip with markers spaced throughout the entire chicken genome, we compared the impact of GS and traditional BLUP (best linear unbiased prediction) selection methods applied side-by-side in three different lines of egg-laying chickens. Differences were demonstrated between methods, both at the level and genomic distribution of allele frequency changes. In all three lines, the average allele frequency changes were larger with GS, 0.056 0.064 and 0.066, compared with BLUP, 0.044, 0.045 and 0.036 for lines B1, B2 and W1, respectively. With BLUP, 35 selected regions (empirical P<0.05) were identified across the three lines. With GS, 70 selected regions were identified. Empirical thresholds for local allele frequency changes were determined from gene dropping, and differed considerably between GS (0.167–0.198) and BLUP (0.105–0.126). Between lines, the genomic regions with large changes in allele frequencies showed limited overlap. Our results show that GS applies selection pressure much more locally than BLUP, resulting in larger allele frequency changes. With these results, novel insights into the nature of selection on quantitative traits have been gained and important questions regarding the long-term impact of GS are raised. The rapid changes to a part of the genetic architecture, while another part may not be selected, at least in the short term, require careful consideration, especially when selection occurs before phenotypes are observed.  相似文献   

4.
The performance of linear regression models in genome-wide association studies is influenced by how marker information is parameterized in the model. Considering the impact of parameterization is especially important when using information from multiple markers to test for association. Properties of the population, such as linkage disequilibrium (LD) and allele frequencies, will also affect the ability of a model to provide statistical support for an underlying quantitative trait locus (QTL). Thus, for a given location in the genome, the relationship between population properties and model parameterization is expected to influence the performance of the model in providing evidence for the position of a QTL. As LD and allele frequencies vary throughout the genome and between populations, understanding the relationship between these properties and model parameterization is of considerable importance in order to make optimal use of available genomic data. Here, we evaluate the performance of regression-based association models using genotype and haplotype information across the full spectrum of allele frequency and LD scenarios. Genetic marker data from 200 broiler chickens were used to simulate genomic conditions by selecting individual markers to act as surrogate QTL (sQTL) and then investigating the ability of surrounding markers to estimate sQTL genotypes and provide statistical support for their location. The LD and allele frequencies of markers and sQTL are shown to have a strong effect on the performance of models relative to one another. Our results provide an indication of the best choice of model parameterization given certain scenarios of marker and QTL LD and allele frequencies. We demonstrate a clear advantage of haplotype-based models, which account for phase uncertainty over other models tested, particularly for QTL with low minor allele frequencies. We show that the greatest advantage of haplotype models over single-marker models occurs when LD between markers and the causal locus is low. Under these situations, haplotype models have a greater accuracy of predicting the location of the QTL than other models tested.  相似文献   

5.
Dominant phenotype of a genetic marker provides incomplete information about the marker genotype of an individual. A consequence of using this incomplete information for mapping quantitative trait loci (QTL) is that the inference of the genotype of a putative QTL flanked by a marker with dominant phenotype will depend on the genotype or phenotype of the next marker. This dependence can be extended further until a marker genotype is fully observed. A general algorithm is derived to calculate the probability distribution of the genotype of a putative QTL at a given genomic position, conditional on all observed marker phenotypes in the region with dominant and missing marker information for an individual. The algorithm is implemented for various populations stemming from two inbred lines in the context of mapping QTL. Simulation results show that if only a proportion of markers contain missing or dominant phenotypes, QTL mapping can be almost as efficient as if there were no missing information in the data. The efficiency of the analysis, however, may decrease substantially when a very large proportion of markers contain missing or dominant phenotypes and a genetic map has to be reconstructed first on the same data as well. So it is important to combine dominant markers with codominant markers in a QTL mapping study. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

6.
Summary Restriction fragment length polymorphic probes are being used more frequently in the molecular analysis of Down's syndrome and in the origin of nondisjunction in the syndrome. The type of information gained from RFLPs overlaps but differs from the information from cytogenetic heteromorphisms. From the allele frequencies of commonly available probes we have derived the expected frequencies of all matings in the population. Each mating has been defined and partitioned to show the genotypes and phenotypes expected, with numerical values based on studies with heteromorphisms. From this we show how the various phenotypes can be used to calculate the origin of nondisjunctions and their expected frequencies. Further, an alternative method is outlined for mapping the distance between a probe and its centromere based on the distortion, caused by crossing-over, of the expected 1st to 2nd division nondisjunction ratio. Finally, we discuss prospects for various uses of probes in the analysis of Down's syndrome.  相似文献   

7.
Obbard DJ  Harris SA  Pannell JR 《Heredity》2006,97(4):296-303
The analysis of genetic diversity within and between populations is a routine task in the study of diploid organisms. However, population genetic studies of polyploid organisms have been hampered by difficulties associated with scoring and interpreting molecular data. This occurs because the presence of multiple alleles at each locus often precludes the measurement of genotype or allele frequencies. In allopolyploids, the problem is compounded because genetically distinct isoloci frequently share alleles. As a result, analysis of genetic diversity patterns in allopolyploids has tended to rely on the interpretation of phenotype frequencies, which loses information available from allele composition. Here, we propose the use of a simple allelic-phenotype diversity statistic (H') that measures diversity as the average number of alleles by which pairs of individuals differ. This statistic can be extended to a population differentiation measure (F'ST), which is analogous to FST. We illustrate the behaviour of these statistics using coalescent computer simulations that show that F'ST behaves in a qualitatively similar way to FST, thus providing a useful way to quantify population differentiation in allopolyploid species.  相似文献   

8.
9.
The squared sib-pair phenotype difference (SQD) has been used as a dependent variable in the Haseman-Elston (H-E) regression quantitative-trait locus (QTL) linkage method, but it has been shown that the SQD does not make full use of linkage information. In this study, we examine the efficiency of SQD in H-E regression compared to other proposed functions of the sib-pair phenotypes. A new function of sib-pair phenotypes, the product of pair values corrected with family mean (PCF), is shown to have desirable properties in many realistic situations. Consistent results were obtained using a combination of large-sample analytic approximations, simulation, and analyses of quantitative-trait data from Genetic Analysis Workshop 10. The advantages of PCF are further improved in the presence of family-specific effects arising from environmental factors or when additional QTLs influence the trait. All of the phenotype functions are incorporated in our new, freely available linkage-mapping program MULTIGENE 1.0 for the PC environment.  相似文献   

10.
A method for reconstructing allele frequencies characteristic of an original ethnically homogeneous population before the start of migration processes is described. Information on both the ethnic group studied and offspring of interethnic marriages is used to estimate the allele frequencies. This makes it possible to increase the informativeness of the sample, which, in the case of ethnic heterogeneity, depends not only on allele frequencies and the total sample size, but also on the ethnic structure of the sample. The problem of estimating allele frequency in an ethnically heterogeneous sample has been solved analytically for diallelic loci. It has been demonstrated that, if offspring of interethnic marriages with the same degree of outbreeding is added to a sample of the ethnic group studied, the sample informativeness does not change. To utilize the information contained in the phenotypes of the offspring of interethnic marriages, representatives of the population from which migration occurs should be included into the sample. The size of the sample ensuring the preassigned accuracy of estimation is minimized at a certain ratio between the numbers of the offspring of interethnic marriages and the "immigrants." To analyze polyallelic loci, a software package has been developed that allows estimating allele frequencies, determining the errors of these estimates, and planning the sample ensuring the preassigned accuracy of estimation. The package is available free at http://mga.bionet.bionet.nsc.ru/PopMixed/PopMixed.html.  相似文献   

11.
Jack da Silva 《Genetics》2009,182(1):265-275
The frequently reported amino acid covariation of the highly polymorphic human immunodeficiency virus type 1 (HIV-1) exterior envelope glycoprotein V3 region has been assumed to reflect fitness epistasis between residues. However, nonrandom association of amino acids, or linkage disequilibrium, has many possible causes, including population subdivision. If the amino acids at a set of sequence sites differ in frequencies between subpopulations, then analysis of the whole population may reveal linkage disequilibrium even if it does not exist in any subpopulation. HIV-1 has a complex population structure, and the effects of this structure on linkage disequilibrium were investigated by estimating within- and among-subpopulation components of variance in linkage disequilibrium. The amino acid covariation previously reported is explained by differences in amino acid frequencies among virus subpopulations in different patients and by nonsystematic disequilibrium among patients. Disequilibrium within patients appears to be entirely due to differences in amino acid frequencies among sampling time points and among chemokine coreceptor usage phenotypes of virus particles, but not source tissues. Positive selection explains differences in allele frequencies among time points and phenotypes, indicating that these differences are adaptive rather than due to genetic drift. However, the absence of a correlation between linkage disequilibrium and phenotype suggests that fitness epistasis is an unlikely cause of disequilibrium. Indeed, when population structure is removed by analyzing sequences from a single time point and phenotype, no disequilibrium is detectable within patients. These results caution against interpreting amino acid covariation and coevolution as evidence for fitness epistasis.  相似文献   

12.
A method for reconstructing allele frequencies characteristic of an original ethnically homogeneous population before the start of migration processes is described. Information on both the ethnic group studied and offspring of interethnic marriages is used to estimate the allele frequencies. This makes it possible to increase the informativeness of the sample, which, in the case of ethnic heterogeneity, depends not only on allele frequencies and the total sample size, but also on the ethnic structure of the sample. The problem of estimating allele frequency in an ethnically heterogeneous sample has been solved analytically for diallelic loci. It has been demonstrated that, if offspring of interethnic marriages with the same degree of outbreeding is added to a sample of the ethnic group studied, the sample informativeness does not change. To utilize the information contained in the phenotypes of the offspring of interethnic marriages, representatives of the population from which migration occurs should be included into the sample. The size of the sample ensuring the preassigned accuracy of estimation is minimized at a certain ratio between the numbers of the offspring of interethnic marriages and the “immigrants.” To analyze polyallelic loci, a software package has been developed that allows estimating allele frequencies, determining the errors of these estimates, and planning the sample ensuring the preassigned accuracy of estimation. The package is available free at http://mga.bionet.nsc.ru/PopMixed/PopMixed.html.__________Translated from Genetika, Vol. 41, No. 7, 2005, pp. 990–996.Original Russian Text Copyright © 2005 by Axenovich, Kirichenko.  相似文献   

13.
Discoveries of mutations conferring resistance to infectious diseases have led to increased interest in the evolutionary dynamics of disease resistance. Several recent papers have estimated the historical strength of selection for mutations conferring disease resistance. These studies are based on simple population genetic models that do not take account of factors such as spatial and family structure. Such factors may have a substantial impact on the strength of natural selection through inclusive fitness effects. That is, people have a strong tendency to live with relatives and therefore have a high probability of transmitting infectious diseases to them. Thus, an allele that protects an individual against disease infection also protects that individual's family members. Because some of these family members are likely to also be carrying the allele, selection for that allele is magnified by family structure. In this paper, I use mathematical modeling techniques to explore the impact of such kin selection on the strength of selection for infectious disease resistance alleles. I show that if the resistance allele has the same proportional effect on both within- and between-family transmission, then the impact of kin selection is relatively minor. Selection coefficients are increased by 5-35%, with a greater benefit for weaker alleles. The reason is that an individual with a strong resistance allele does not need much protection from infection by family members and thus does not benefit much from their alleles. The effect of kin selection can be dramatic, however, if the resistance allele has a larger effect on between-family transmission than within-family transmission (which can occur if between-family infection rates are much smaller than within-family rates), increasing selection coefficients by as much as two- to threefold. These results show conditions when it is important to consider family structure in estimates of the strength of selection for infectious disease resistance alleles.  相似文献   

14.
Genetic association studies require that the genotype data from a given person can be correctly linked to the phenotype data from the same person. However, sample misidentification errors sometimes happen, whereby the link becomes invalid for some of the subjects in a study. This can have substantial consequences in terms of power to detect truly associated variants. In family-based studies, Mendelian inconsistencies can be used to detect sample misidentification. Genome-wide association studies (GWAS), however, typically use unrelated individuals, making error detection more problematic. Here we present a method for identifying potential sample misidentifications in GWAS and other genetic association studies building on ideas from forensic sciences. A widely used ad-hoc method for error detection is to check if the sex of an individual matches its X-linked genotype. We generalize this idea to less stringent associations between known genotypes and phenotypes, and show that if several known associations are combined, the power to detect misidentifications increases substantially. Individuals with an unlikely set of phenotypes given their genotypes are flagged as potential errors. We provide analytical and simulation results comparing the odds that the genotype and phenotype are both from the same individual for different numbers of available genotype-p henotype associations and for different information content of the associations. Our method has good sensitivity and specificity with as few as ten moderately informative genotype-phenotype associations. We apply the method to GWAS data from the Danish National Birth Cohort.  相似文献   

15.
The paper deals with the following question: when do the phenotypic evolutionarily stable state (ESS) and the evolutionarily stable allele distribution (ESAD) coincide? It is supposed that for a sexual population, in dominant-recessive inheritance system, n allele at one autosomal locus determine n possible pure individual phenotypes and each pure phenotype is obtained as the phenotype of a homozygote. Under these conditions, earlier results of the authors imply that, if a phenotype distribution is an ESS then the allele distribution generating it is an ESAD. In this paper, apart from a certain degenerate pay-off matrices, the inverse statement is also proved: if a distribution is an ESAD then the corresponding phenotypic distribution is an ESS.  相似文献   

16.
Expression of Mdh1 alleles has been studied in 60 apozygotic (agamospermic) sugar beet progenies. Seed progenies were obtained by uniparental (pollen less) mode of seed reproduction: selfing of pollen-sterile plants isolated with paper bags. The apozygotic seed progenies demonstrate a disomic gamete autosegregation, i.e., the ratio between genotypes in the progenies correspond to the gamete segregation in a duplex heterozygote of an autotetraploid. It was shown that the ratio between the Mdh1 phenotypes in apozygotic progenies is strongly affected by spontaneous inactivation of one of the alleles. In most progenies, the excess of FF phenotypes and the deficit of SS phenotypes were observed. In our opinion, such deviations in genotype and phenotype frequencies result from conversion of the active Mdh1-S into the inactive Mdh1-S0 allele (epigenetic gene inactivation). The spontaneous inactivation of one allele results in extremely variable frequencies of heterozygous Mdh1-F/Mdh1-S genotypes and phenotypes in the apozygotic seed progenies. The empirical distribution of the frequencies of heterozygous genotypes in the apozygotic seed progenies is given by a negative binomial distribution describing the expected time of occurrence of random events.  相似文献   

17.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.  相似文献   

18.
Genetic researchers often collect disease related quantitative traits in addition to disease status because they are interested in understanding the pathophysiology of disease processes. In genome-wide association (GWA) studies, these quantitative phenotypes may be relevant to disease development and serve as intermediate phenotypes or they could be behavioral or other risk factors that predict disease risk. Statistical tests combining both disease status and quantitative risk factors should be more powerful than case-control studies, as the former incorporates more information about the disease. In this paper, we proposed a modified inverse-variance weighted meta-analysis method to combine disease status and quantitative intermediate phenotype information. The simulation results showed that when an intermediate phenotype was available, the inverse-variance weighted method had more power than did a case-control study of complex diseases, especially in identifying susceptibility loci having minor effects. We further applied this modified meta-analysis to a study of imputed lung cancer genotypes with smoking data in 1154 cases and 1137 matched controls. The most significant SNPs came from the CHRNA3-CHRNA5-CHRNB4 region on chromosome 15q24–25.1, which has been replicated in many other studies. Our results confirm that this CHRNA region is associated with both lung cancer development and smoking behavior. We also detected three significant SNPs—rs1800469, rs1982072, and rs2241714—in the promoter region of the TGFB1 gene on chromosome 19 (p = 1.46×10−5, 1.18×10−5, and 6.57×10−6, respectively). The SNP rs1800469 is reported to be associated with chronic obstructive pulmonary disease and lung cancer in cigarette smokers. The present study is the first GWA study to replicate this result. Signals in the 3q26 region were also identified in the meta-analysis. We demonstrate the intermediate phenotype can potentially enhance the power of complex disease association analysis and the modified meta-analysis method is robust to incorporate intermediate phenotype or other quantitative risk factor in the analysis.  相似文献   

19.
Although approaches for performing genome‐wide association studies (GWAS) are well developed, conventional GWAS requires high‐density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP‐GWAS (extreme‐phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well‐characterized kernel row number trait, which was selected to enable comparisons between the results of XP‐GWAS and conventional GWAS. An exome‐sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait‐associated variants were significantly enriched in regions identified by conventional GWAS. XP‐GWAS was able to resolve several linked QTL and detect trait‐associated variants within a single gene under a QTL peak. XP‐GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest.  相似文献   

20.
Large scale gene mapping efforts in domestic animals have generated and mapped a large number of genetic markers that are useful for mapping quantitative trait and disease loci and for DNA diagnostic purposes such as parentage testing. Marker polymorphism is an important criterion for selecting genetic markers in planning experiment for mapping quantitative trait loci or for DNA diagnostic purposes. Current formulations of marker polymorphism measures are functions of marker allele frequencies. In this study, two measures of marker polymorphism that are available from gene mapping studies and do not require allele frequencies were proposed and analyzed: the observed polymorphic information content (PIC) and the observed family information content (FIC). The observed FIC was more stable than the observed PIC because the observed FIC is unaffected by the variation in the frequency of heterozygous parents. However, both FIC and PIC are dependent on the gene mapping design. The effective number of alleles is recommended as a tool to standardize marker polymorphism measures so that polymorphism of different markers can be compared on an equal basis, and to obtain a new polymorphism measure (such an exclusion probability) from an existing measure (such as FIC). The usage of the effective number of alleles to standardize FIC, PIC and exclusion probabilities is illustrated using genetic markers in a published linkage map.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号