首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Population size information is critical for managing endangered or harvested populations. Population size can now be estimated from non-invasive genetic sampling. However, pitfalls remain such as genotyping errors (allele dropout and false alleles at microsatellite loci). To evaluate the feasibility of non-invasive sampling (e.g., for population size estimation), a pilot study is required. Here, we present a pilot study consisting of (i) a genetic step to test loci amplification and to estimate allele frequencies and genotyping error rates when using faecal DNA, and (ii) a simulation step to quantify and minimise the effects of errors on estimates of population size. The pilot study was conducted on a population of red deer in a fenced natural area of 5440 ha, in France. Twelve microsatellite loci were tested for amplification and genotyping errors. The genotyping error rates for microsatellite loci were 0–0.83 (mean=0.2) for allele dropout rates and 0–0.14 (mean=0.02) for false allele rates, comparable to rates encountered in other non-invasive studies. Simulation results suggest we must conduct 6 PCR amplifications per sample (per locus) to achieve approximately 97% correct genotypes. The 3% error rate appears to have little influence on the accuracy and precision of population size estimation. This paper illustrates the importance of conducting a pilot study (including genotyping and simulations) when using non-invasive sampling to study threatened or managed populations.  相似文献   

2.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

3.
Kang SJ  Finch SJ  Haynes C  Gordon D 《Human heredity》2004,58(3-4):139-144
Kang et al. [Genet Epidemiol 2004;26:132-141] addressed the question of which genotype misclassification errors are most costly, in terms of minimum percentage increase in sample size necessary (%MSSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association in a genetic model-free setting. They answered the question for single nucleotide polymorphisms (SNPs) using the 2 x 3 chi2 test of independence. We address the same question here for a genetic model-based framework. The genetic model parameters considered are: disease model (dominant, recessive), genotypic relative risk, SNP (marker) and disease allele frequency, and linkage disequilibrium. %MSSN coefficients of each of the six possible error rates are determined by expanding the non-centrality parameter of the asymptotic distribution of the 2 x 3 chi2 test under a specified alternative hypothesis to approximate %MSSN using a linear Taylor series in the error rates. In this work we assume errors misclassifying one homozygote as another homozygote are 0, since these errors are thought to rarely occur in practice. Our findings are that there are settings of the genetic model parameters that lead to large total %MSSN for both dominant and recessive models. As SNP minor allele approaches 0, total %MSSN increases without bound, independent of other genetic model parameters. In general, %MSSN is a complex function of the genetic model parameters. Use of SNPs with small minor allele frequency requires careful attention to frequency of genotyping errors to insure that power specifications are met. Software to perform these calculations for study design is available, and an example of its use to study a disease is given.  相似文献   

4.
Microsatellite genotyping from samples with varying quality can result in an uneven distribution of errors. Previous studies reporting error rates have focused on estimating the effects of both randomly distributed and locus‐specific errors. Sample‐specific errors, however, can also significantly affect results in population studies despite a large sample size. From two studies including six microsatellite markers genotyped from 272 sperm whale DNA samples, and 33 microsatellites genotyped from 213 bowhead whales, we investigated the effects of sample‐ and locus‐specific errors on calculations of Hardy–Weinberg equilibrium. The results of a jackknife analysis in these two studies identified seven individuals that were highly influential on estimates of Hardy–Weinberg equilibrium for six different markers. In each case, the influential individual was homozygous for a rare allele. Our results demonstrate that Hardy–Weinberg P values are very sensitive to homozygosity in rare alleles for single individuals, and that > 50% of these cases involved genotype errors likely due to low sample quality. This raises the possibility that even small, normal levels of laboratory errors can result in an overestimate of the degree to which markers are out of Hardy–Weinberg equilibrium and hence overestimate population structure. To avoid such bias, we recommend routine identification of influential individuals and multiple replications of those samples.  相似文献   

5.
Next-generation sequencing data will soon become routinely available for association studies between complex traits and rare variants. Sequencing data, however, are characterized by the presence of sequencing errors at each individual genotype. This makes it especially challenging to perform association studies of rare variants, which, due to their low minor allele frequencies, can be easily perturbed by genotype errors. In this article, we develop the quality-weighted multivariate score association test (qMSAT), a new procedure that allows powerful association tests between complex traits and multiple rare variants under the presence of sequencing errors. Simulation results based on quality scores from real data show that the qMSAT often dominates over current methods, that do not utilize quality information. In particular, the qMSAT can dramatically increase power over existing methods under moderate sample sizes and relatively low coverage. Moreover, in an obesity data study, we identified using the qMSAT two functional regions (MGLL promoter and MGLL 3'-untranslated region) where rare variants are associated with extreme obesity. Due to the high cost of sequencing data, the qMSAT is especially valuable for large-scale studies involving rare variants, as it can potentially increase power without additional experimental cost. qMSAT is freely available at http://qmsat.sourceforge.net/.  相似文献   

6.
Tung L  Gordon D  Finch SJ 《Human heredity》2007,63(2):101-110
This paper extends gene-environment (G x E) interaction study designs in which the gene (G) is known and the environmental variable (E) is specified to the analysis of 'time-to-event' data, using Cox proportional hazards (PH) modeling. The objectives are to assess whether a random sample of subjects can be used to detect a specific G x E interaction and to study the sensitivity of the power of PH modeling to genotype misclassification. We find that a random sample of 2,100 is sufficient to detect a moderate G x E interaction. The increase in sample size necessary (SSN) to maintain Type I and Type II error rates is calculated for each of the 6 genotyping errors for both dominant and recessive modes of inheritance (MOI). The increase in SSN required is relatively small when each genotyping error rate is less than 1% and the disease allele frequency is between 0.2 and 0.5. The genotyping errors that require the greatest increase in SSN are any misclassification of a subject without the at-risk genotype as having the at-risk genotype. Such errors require an indefinitely large increase in SSN as the disease allele frequency approaches 0, suggesting that it is especially important that subjects recorded as having the at-risk genotype be correctly genotyped. Additionally, for a dominant MOI, large increases in SSN can occur with large disease allele frequency.  相似文献   

7.
Microsatellite genotyping errors will be present in all but the smallest data sets and have the potential to undermine the conclusions of most downstream analyses. Despite this, little rigorous effort has been made to quantify the size of the problem and to identify the commonest sources of error. Here, we use a large data set comprising almost 2000 Antarctic fur seals Arctocephalus gazella genotyped at nine hypervariable microsatellite loci to explore error detection methods, common sources of error and the consequences of errors on paternal exclusion. We found good concordance among a range of contrasting approaches to error-rate estimation, our range being 0.0013 to 0.0074 per single locus PCR (polymerase chain reaction). The best approach probably involves blind repeat-genotyping, but this is also the most labour-intensive. We show that several other approaches are also effective at detecting errors, although the most convenient alternative, namely mother-offspring comparisons, yielded the lowest estimate of the error rate. In total, we found 75 errors, emphasizing their ubiquitous presence. The most common errors involved the misinterpretation of allele banding patterns (n = 60, 80%) and of these, over a third (n = 22, 36.7%) were due to confusion between homozygote and adjacent allele heterozygote genotypes. A specific test for whether a data set contains the expected number of adjacent allele heterozygotes could provide a useful tool with which workers can assess the likely size of the problem. Error rates are also positively correlated with both locus polymorphism and product size, again indicating aspects where extra effort at error reduction should be directed. Finally, we conducted simulations to explore the potential impact of genotyping errors on paternity exclusion. Error rates as low as 0.01 per allele resulted in a rate of false paternity exclusion exceeding 20%. Errors also led to reduced estimates of male reproductive skew and increases in the numbers of pups that matched more than one candidate male. Because even modest error rates can be strongly influential, we recommend that error rates should be routinely published and that researchers make an attempt to calculate how robust their analyses are to errors.  相似文献   

8.
Much forensic inference based upon DNA evidence is made assuming that the Hardy-Weinberg equilibrium (HWE) is valid for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, each having advantages and limitations. The limitations become more obvious when testing for deviation within multiallelic DNA loci is attempted. Here we present an exact test for HWE in the biallelic case, based on the ratio of weighted likelihoods under the null and alternative hypotheses, the Bayes factor. This test does not depend on asymptotic results and minimizes a linear combination of type I and type II errors. By ordering the sample space using the Bayes factor, we also define a significance (evidence) index, P value, using the weighted likelihood under the null hypothesis. We compare it to the conditional exact test for the case of sample size n = 10. Using the idea under the method of chi(2) partition, the test is used sequentially to test equilibrium in the multiple allele case and then applied to two short tandem repeat loci, using a real Caucasian data bank, showing its usefulness.  相似文献   

9.
Case-control disease-marker association studies are often used in the search for variants that predispose to complex diseases. One approach to increasing the power of these studies is to enrich the case sample for individuals likely to be affected because of genetic factors. In this article, we compare three case-selection strategies that use allele-sharing information with the standard strategy that selects a single individual from each family at random. In affected sibship samples, we show that, by carefully selecting sibships and/or individuals on the basis of allele sharing, we can increase the frequency of disease-associated alleles in the case sample. When these cases are compared with unrelated controls, the difference in the frequency of the disease-associated allele is therefore also increased. We find that, by choosing the affected sib who shows the most evidence for pairwise allele sharing with the other affected sibs in families, the test statistic is increased by >20%, on average, for additive models with modest genotype relative risks. In addition, we find that the per-genotype information associated with the allele sharing-based strategies is increased compared with that associated with random selection of a sib for genotyping. Even though we select sibs on the basis of a nonparametric statistic, the additional gain for selection based on the unknown underlying mode of inheritance is minimal. We show that these properties hold even when the power to detect linkage to a region in the entire sample is negligible. This approach can be extended to more-general pedigree structures and quantitative traits.  相似文献   

10.
The transmission/disequilibrium test (TDT), a family-based test of linkage and association, is a popular and intuitive statistical test for studies of complex inheritance, as it is nonparametric and robust to population stratification. We carried out a literature search and located 79 significant TDT-derived associations between a microsatellite marker allele and a disease. Among these, there were 31 (39%) in which the most common allele was found to exhibit distorted transmission to affected offspring, implying that the allele may be associated with either susceptibility to or protection from a disease. In 27 of these 31 studies (87%), the most common allele appeared to be overtransmitted to affected offspring (a risk factor), and, in the remaining 4 studies, the most common allele appeared to be undertransmitted (a protective factor). In a second literature search, we identified 92 case-control studies in which a microsatellite marker allele was found to have significantly different frequencies in case and control groups. Of these, there were 37 instances (40%) in which the most common allele was involved. In 12 of these 37 studies (32%), the most common allele was enriched in cases relative to controls (a risk factor), and, in the remaining 25 studies, the most common allele was enriched in controls (a protective factor). Thus, the most common allele appears to be a risk factor when identified through the TDT, and it appears to be protective when identified through case-control analysis. To understand this phenomenon, we incorporated an error model into the calculation of the TDT statistic. We show that undetected genotyping error can cause apparent transmission distortion at markers with alleles of unequal frequency. We demonstrate that this distortion is in the direction of overtransmission for common alleles. Therefore, we conclude that undetected genotyping errors may be contributing to an inflated false-positive rate among reported TDT-derived associations and that genotyping fidelity must be increased.  相似文献   

11.
Microsatellite data are widely used to test ecological and evolutionary hypotheses in wild populations. In this paper, we consider three typical sources of scoring errors capable of biasing biological conclusions: stuttering, large‐allele dropout and null alleles. We describe methods to detect errors and propose conventions to mitigate scoring errors and report error rates in studies of wild populations. Finally, we discuss potential bias in ecological or evolutionary conclusions based on data sets containing these scoring errors.  相似文献   

12.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

13.
When applying the Cochran‐Armitage (CA) trend test for an association between a candidate allele and a disease in a case‐control study, a set of scores must be assigned to the genotypes. Sasieni (1997, Biometrics 53 , 1253–1261) suggested scores for the recessive, additive, and dominant models but did not examine their statistical properties. Using the criteria of minimizing the required sample size of the CA trend test to achieve prespecified type I and type II errors, we show that the scores given by Sasieni (1997) are optimal for the recessive and dominant models and locally optimal for the additive one. Moreover, the additive scores are shown to be locally optimal for the multiplicative model. The tests are applied to a real dataset.  相似文献   

14.
Geller F  Ziegler A 《Human heredity》2002,54(3):111-117
One well-known approach for the analysis of transmission-disequilibrium is the investigation of single nucleotide polymorphisms (SNPs) in trios consisting of an affected child and its parents. Results may be biased by erroneously given genotypes. Various reasons, among them sample swap or wrong pedigree structure, represent a possible source for biased results. As these can be partly ruled out by good study conditions together with checks for correct pedigree structure by a series of independent markers, the remaining main cause for errors is genotyping errors. Some of the errors can be detected by Mendelian checks whilst others are compatible with the pedigree structure. The extent of genotyping errors can be estimated by investigating the rate of detected genotyping errors by Mendelian checks. In many studies only one SNP of a specific genomic region is investigated by TDT which leaves Mendelian checks as the only tool to control genotyping errors. From the rate of detected errors the true error rate can be estimated. Gordon et al. [Hum Hered 1999;49:65-70] considered the case of genotyping errors that occur randomly and independently with some fixed probability for the wrong ascertainment of an allele. In practice, instead of single alleles, SNP genotypes are determined. Therefore, we study the proportion of detected errors (detection rate) based on genotypes. In contrast to Gordon et al., who reported detection rates between 25 and 30%, we obtain higher detection rates ranging from 39 up to 61% considering likely error structures in the data. We conclude that detection rates are probably substantially higher than those reported by Gordon et al.  相似文献   

15.
The coancestry coefficient, also known as the population structure parameter, is of great interest in population genetics. It can be thought of as the intraclass correlation of pairs of alleles within populations and it can serve as a measure of genetic distance between populations. For a general class of evolutionary models it determines the distribution of allele frequencies among populations. Under more restrictive models it can be regarded as the probability of identity by descent of any pair of alleles at a locus within a random mating population. In this paper we review estimation procedures that use the method of moments or are maximum likelihood under the assumption of normally distributed allele frequencies. We then consider the problem of testing hypotheses about this parameter. In addition to parametric and non-parametric bootstrap tests we present an asymptotically-distributed chi-square test. This test reduces to the contingency-table test for equal sample sizes across populations. Our new test appears to be more powerful than previous tests, especially for loci with multiple alleles. We apply our methods to HapMap SNP data to confirm that the coancestry coefficient for humans is strictly positive.  相似文献   

16.
The Detection of Linkage Disequilibrium in Molecular Sequence Data   总被引:15,自引:4,他引:11       下载免费PDF全文
R. C. Lewontin 《Genetics》1995,140(1):377-388
Studies of genetic variation in natural populations at the sequence level usually show that most polymorphic sites are very asymmetrical in allele frequencies, with the rarer allele at a site near fixation. When the rarer allele at a site is present only a few times in the sample, say below five representatives, it becomes very difficult to detect linkage disequilibrium between sites from tests of association. This is a consequence of the numerical properties of even the most powerful test of association, Fisher's exact test. Sites with fewer than five representatives in the sample should be excluded from association tests, but this generally leaves few site pairs eligible for testing. A test for overall linkage disequilibrium, based on the sign of the observed linkage disequilibria, is derived which can use all the data. It is shown that more power can be achieved by increasing the length of sequence determined than by increasing the number of genomes sampled for the same total work.  相似文献   

17.
使用紧密相邻的标记位点且与标记基因频率无关的哈迪-温伯格不平衡(HWD)指数被用来对数量性状位点(QTL)进行精细定位.本文讨论了当存在基因型错误时HWD指数的性质.文章指出,当存在基因型错误时,对于在群体的标记基因频率已知的情形使用的两个HWD指数尽管受基因型错误的影响但仍然有效;而仅仅极端样本的标记基因频率已知的情形下使用的两个HWD指数同时与基因型错误和标记基因频率有关.计算机模拟表明,仅仅极端样本的标记基因频率已知的情形下使用的两个HWD指数在精细定位时会产生偏差,不适宜作精细定位.  相似文献   

18.
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.  相似文献   

19.
Replication has become the gold standard for assessing statistical results from genome-wide association studies. Unfortunately this replication requirement may cause real genetic effects to be missed. A real result can fail to replicate for numerous reasons including inadequate sample size or variability in phenotype definitions across independent samples. In genome-wide association studies the allele frequencies of polymorphisms may differ due to sampling error or population differences. We hypothesize that some statistically significant independent genetic effects may fail to replicate in an independent dataset when allele frequencies differ and the functional polymorphism interacts with one or more other functional polymorphisms. To test this hypothesis, we designed a simulation study in which case-control status was determined by two interacting polymorphisms with heritabilities ranging from 0.025 to 0.4 with replication sample sizes ranging from 400 to 1600 individuals. We show that the power to replicate the statistically significant independent main effect of one polymorphism can drop dramatically with a change of allele frequency of less than 0.1 at a second interacting polymorphism. We also show that differences in allele frequency can result in a reversal of allelic effects where a protective allele becomes a risk factor in replication studies. These results suggest that failure to replicate an independent genetic effect may provide important clues about the complexity of the underlying genetic architecture. We recommend that polymorphisms that fail to replicate be checked for interactions with other polymorphisms, particularly when samples are collected from groups with distinct ethnic backgrounds or different geographic regions.  相似文献   

20.
A method for reconstructing allele frequencies characteristic of an original ethnically homogeneous population before the start of migration processes is described. Information on both the ethnic group studied and offspring of interethnic marriages is used to estimate the allele frequencies. This makes it possible to increase the informativeness of the sample, which, in the case of ethnic heterogeneity, depends not only on allele frequencies and the total sample size, but also on the ethnic structure of the sample. The problem of estimating allele frequency in an ethnically heterogeneous sample has been solved analytically for diallelic loci. It has been demonstrated that, if offspring of interethnic marriages with the same degree of outbreeding is added to a sample of the ethnic group studied, the sample informativeness does not change. To utilize the information contained in the phenotypes of the offspring of interethnic marriages, representatives of the population from which migration occurs should be included into the sample. The size of the sample ensuring the preassigned accuracy of estimation is minimized at a certain ratio between the numbers of the offspring of interethnic marriages and the "immigrants." To analyze polyallelic loci, a software package has been developed that allows estimating allele frequencies, determining the errors of these estimates, and planning the sample ensuring the preassigned accuracy of estimation. The package is available free at http://mga.bionet.bionet.nsc.ru/PopMixed/PopMixed.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号