首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Wang T  Elston RC 《Human heredity》2005,60(3):134-142
The lack of replication of model-free linkage analyses performed on complex diseases raises questions about the robustness of these methods to various biases. The confounding effect of population stratification on a genetic association study has long been recognized in the genetic epidemiology community. Because the estimation of the number of alleles shared identical by descent (IBD) does not depend on the marker allele frequency when founders of families are observed, model-free linkage analysis is usually thought to be robust to population stratification. However, for common complex diseases, the genotypes of founders are often unobserved and therefore population stratification has the potential to impair model-free linkage analysis. Here, we demonstrate that, when some or all of the founder genotypes are missing, population stratification can introduce deleterious effects on various model-free linkage methods or designs. For an affected sib pair design, it can cause excess false-positive discoveries even when the trait distribution is homogeneous among subpopulations. After incorporating a control group of discordant sib pairs or for a quantitative trait, two circumstances must be met for population stratification to be a confounder: the distributions for both the marker and the trait must be heterogeneous among subpopulations. When this occurs, the bias can result in either a liberal, and hence invalid, test or a conservative test. Bias can be eliminated or alleviated by inclusion of founders' or other family members' genotype data. When this is not possible, new methods need to be developed to be robust to population stratification.  相似文献   

2.
This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447–2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8–30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders'' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics.  相似文献   

3.
Summary The development of molecular markers has recently raised expectations for their application in selection programs. However, some questions related to quantitative trait loci (QTL) identification are still unanswered. The objectives of this paper are (1) to develop statistical genetic models for detecting and locating on the genome multi-QTL with additive, dominance and epistatic effects using multiple linear regression analysis in the backcross and Fn generations from the cross of two inbred lines; and (2) to discuss the bias caused by linked and unlinked QTL on the genetic estimates. Non-linear models were developed for different backcross and Fn generations when both epistasis and no epistasis were assumed. Generation analysis of marked progenies is suggested as a way of increasing the number of observations for the estimates without additional cost for molecular scoring. Some groups of progenies can be created in different generations from the same scored individuals. The non-linear models were transformed into approximate multivariate linear models to which combined stepwise and standard regression analysis could be applied. Expressions for the biases of the marker classes from linked QTL were obtained when no epistasis was assumed. When epistasis was assumed, these expressions increased in complexity, and the biases were caused by both linked and unlinked QTL.  相似文献   

4.
We present theoretical explanations and show through simulation that the individual admixture proportion estimates obtained by using ancestry informative markers should be seen as an error-contaminated measurement of the underlying individual ancestry proportion. These estimates can be used in structured association tests as a control variable to limit type I error inflation or reduce loss of power due to population stratification observed in studies of admixed populations. However, the inclusion of such error-containing variables as covariates in regression models can bias parameter estimates and reduce ability to control for the confounding effect of admixture in genetic association tests. Measurement error correction methods offer a way to overcome this problem but require an a priori estimate of the measurement error variance. We show how an upper bound of this variance can be obtained, present four measurement error correction methods that are applicable to this problem, and conduct a simulation study to compare their utility in the case where the admixed population results from the intermating between two ancestral populations. Our results show that the quadratic measurement error correction (QMEC) method performs better than the other methods and maintains the type I error to its nominal level.  相似文献   

5.
Scherag et al. [Hum Hered 2002;54:210-217] recently proposed point estimates and asymptotic as well as exact confidence intervals for genotype relative risks (GRRs) and the attributable risk (AR) in case parent trio designs using single nucleotide polymorphism (SNP) data. The aim of this study was the investigation of coverage probabilities and bias in estimates if the marker locus is not identical to the disease locus. Using a variety of parameter constellations, including marker allele frequencies identical to and different from the SNP at the disease locus, we performed an analytical study to quantify the bias and a Monte-Carlo simulation study for quantifying both bias and coverage probabilities. No bias was observed if marker and trait locus coincided. Two parameters had a strong impact on coverage probabilities of confidence intervals and bias in point estimates if they did not coincide: the linkage disequilibrium (LD) parameter delta and the allele frequency at the marker SNP. If marker allele frequencies were different from the allele frequencies at the functional SNP, substantial biases occurred. Further, if delta between the marker and the disease locus was lower than the maximum possible delta, estimates were also biased. In general, biases were towards the null hypothesis for both GRRs and AR. If one GRR was not increased, as e.g. in a recessive genetic model, biases away from the null could be observed. If both GRRs were in identical directions and if both were substantially larger than 1, the bias always was towards the null. When applying point estimates and confidence intervals for GRRs and AR in candidate gene studies, great care is needed. Effect estimates are substantially biased towards the null if either the allele frequencies at the marker SNP and the true disease locus are different or if the LD between the marker SNP and the disease locus is not at its maximum. A bias away from the null occurs only in uncommon study situations; it is small and can therefore be ignored for applications.  相似文献   

6.

Background

When estimating marker effects in genomic selection, estimates of marker effects may simply act as a proxy for pedigree, i.e. their effect may partially be attributed to their association with superior parents and not be linked to any causative QTL. Hence, these markers mainly explain polygenic effects rather than QTL effects. However, if a polygenic effect is included in a Bayesian model, it is expected that the estimated effect of these markers will be more persistent over generations without having to re-estimate the marker effects every generation and will result in increased accuracy and reduced bias.

Methods

Genomic selection using the Bayesian method, ''BayesB'' was evaluated for different marker densities when a polygenic effect is included (GWpEBV) and not included (GWEBV) in the model. Linkage disequilibrium and a mutation drift balance were obtained by simulating a population with a Ne of 100 over 1,000 generations.

Results

Accuracy of selection was slightly higher for the model including a polygenic effect than for the model not including a polygenic effect whatever the marker density. The accuracy decreased in later generations, and this reduction was stronger for lower marker densities. However, no significant difference in accuracy was observed between the two models. The linear regression of TBV on GWEBV and GWpEBV was used as a measure of bias. The regression coefficient was more stable over generations when a polygenic effect was included in the model, and was always between 0.98 and 1.00 for the highest marker density. The regression coefficient decreased more quickly with decreasing marker density.

Conclusions

Including a polygenic effect had no impact on the selection accuracy, but showed reduced bias, which is especially important when estimates of genome-wide markers are used to estimate breeding values over more than one generation.  相似文献   

7.
In population-based case-control association studies, the regular chi (2) test is often used to investigate association between a candidate locus and disease. However, it is well known that this test may be biased in the presence of population stratification and/or genotyping error. Unlike some other biases, this bias will not go away with increasing sample size. On the contrary, the false-positive rate will be much larger when the sample size is increased. The usual family-based designs are robust against population stratification, but they are sensitive to genotype error. In this article, we propose a novel method of simultaneously correcting for the bias arising from population stratification and/or for the genotyping error in case-control studies. The appropriate corrections depend on sample odds ratios of the standard 2x3 tables of genotype by case and control from null loci. Therefore, the test is simple to apply. The corrected test is robust against misspecification of the genetic model. If the null hypothesis of no association is rejected, the corrections can be further used to estimate the effect of the genetic factor. We considered a simulation study to investigate the performance of the new method, using parameter values similar to those found in real-data examples. The results show that the corrected test approximately maintains the expected type I error rate under various simulation conditions. It also improves the power of the association test in the presence of population stratification and/or genotyping error. The discrepancy in power between the tests with correction and those without correction tends to be more extreme as the magnitude of the bias becomes larger. Therefore, the bias-correction method proposed in this article should be useful for the genetic analysis of complex traits.  相似文献   

8.
Biased mutations and microsatellite variation   总被引:10,自引:6,他引:4  
Mutation bias is one of the forces that may constrain the variation at microsatellite loci. Here, we study the dynamics of population statistics and the genetic distance between two populations under multiple stepwise mutations with linear bias and random drift. Expressions are derived for these statistics as functions of time, as well as at mutation-drift equilibrium. Applying these expressions to published data on humans and chimpanzees, the regression coefficient of mutation bias on allele size was estimated to be at least between - 0.0064 and -0.013. The assumption of mutational bias produces larger estimates of divergence times than are obtained in its absence; in particular, the time of split between African and non-African human populations is estimated to be between 183,000 and 222,000 years, assuming one-step mutations and no selection. With multistep mutations, the divergence time is estimated to be lower.   相似文献   

9.
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability , both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence.  相似文献   

10.
Development and application of photogrammetric mass-estimation techniques in marine mammal studies is becoming increasingly common. When a photogrammetrically estimated mass is used as a covariate in regression modeling, the error associated with estimating mass induces bias in regression statistics and decreases model explanatory power. Thus, it is important to understand and account for prediction variance when addressing ecological questions that require use of estimated mass values. In a simulation study based on data collected from Weddell seals, we developed regression models of pup weaning mass as a function of maternal postparturition mass where maternal mass was directly measured and second where maternal mass was photogrammetrically estimated. We demonstrate that when estimated mass was used, the regression coefficient was biased toward zero and the coefficient of determination was 30% less than the value obtained when using maternal postparturition mass obtained from direct measurement. After applying bias correction procedures, however, the regression coefficient and coefficient of determination were within 2% of their true values. To effectively use photogrammetrically estimated masses, prediction variance should be understood and accounted for in all analyses. The methods presented in this paper are effective and simple techniques to explore and account for prediction variance.  相似文献   

11.
控制数量性状的基因作用历来是遗传学工作者所关注的重要课题.本文对以正交表形式表现的共显性动物分子标记资料,根据加性效应基因控制的数量性状遗传模型配合了动物分子标记回归方程通式.结果表明:对以正交表形式表现的加性效应基因共显性分子标记资料配合的分子标记回归方程,可对加性等位基因的相对作用差加以估计,并可作为育种的依据.  相似文献   

12.
Despite increased interest in applying single nucleotide polymorphism (SNP) data to questions in natural systems, one unresolved issue is to what extent the ascertainment bias induced during the SNP discovery phase will impact available analysis methods. Although most studies addressing ascertainment bias have focused on human populations, it is not clear whether existing methods will work when applied to other species with more complex demographic histories and more significant levels of population structure. Here we present findings from an empirical approach to exploring the effect of population structure on issues of ascertainment bias in the Eastern Fence Lizard, Sceloporus undulatus. We find that frequency spectra and summary statistics were highly sensitive to SNP discovery strategy, necessitating careful selection of the initial ascertainment panel. Randomly selected ascertainment panels performed equally well as ascertainment panels chosen to jointly sample geographic, phenotypic, and genetic diversity. Geographically restricted panels resulted in larger biases. Additionally, we found existing ascertainment bias correction methods, which were not developed for geographically structured data sets, were largely effective at reducing the impact of ascertainment bias. Because bias correction methods performed well even when underlying assumptions were violated, our results suggest tools are currently available to analyze SNP data in structured populations.  相似文献   

13.

Background  

When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated.  相似文献   

14.
Understanding responses of forests to increasing CO2 and temperature is an important challenge, but no easy task. Tree rings are increasingly used to study such responses. In a recent study, van der Sleen et al. (2014) Nature Geoscience, 8, 4 used tree rings from 12 tropical tree species and find that despite increases in intrinsic water use efficiency, no growth stimulation is observed. This challenges the idea that increasing CO2 would stimulate growth. Unfortunately, tree ring analysis can be plagued by biases, resulting in spurious growth trends. While their study evaluated several biases, it does not account for all. In particular, one bias may have seriously affected their results. Several of the species have recruitment patterns, which are not uniform, but clustered around one specific year. This results in spurious negative growth trends if growth rates are calculated in fixed size classes, as ‘fast‐growing’ trees reach the sampling diameter earlier compared to slow growers and thus fast growth rates tend to have earlier calendar dates. We assessed the effect of this ‘nonuniform age bias’ on observed growth trends and find that van der Sleen's conclusions of a lack of growth stimulation do not hold. Growth trends are – at least partially – driven by underlying recruitment or age distributions. Species with more clustered age distributions show more negative growth trends, and simulations to estimate the effect of species’ age distributions show growth trends close to those observed. Re‐evaluation of the growth data and correction for the bias result in significant positive growth trends of 1–2% per decade for the full period, and 3–7% since 1950. These observations, however, should be taken cautiously as multiple biases affect these trend estimates. In all, our results highlight that tree ring studies of long‐term growth trends can be strongly influenced by biases if demographic processes are not carefully accounted for.  相似文献   

15.
MOTIVATION: Although population-based association mapping may be subject to the bias caused by population stratification, alternative methods that are robust to population stratification such as family-based linkage analysis have lower mapping resolution. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate genes and traits of interest. The association between a candidate gene and a quantitative trait is often evaluated via a regression model with inferred population structure variables as covariates, where the residual distribution is customarily assumed to be from a symmetric and unimodal parametric family, such as a Gaussian, although this may be inappropriate for the analysis of many real-life datasets. RESULTS: In this article, we proposed a new structured association (SA) test. Our method corrects for continuous population stratification by first deriving population structure and kinship matrices through a set of random genetic markers and then modeling the relationship between trait values, genotypic scores at a candidate marker and genetic background variables through a semiparametric model, where the error distribution is modeled as a mixture of Polya trees centered around a normal family of distributions. We compared our model to the existing SA tests in terms of model fit, type I error rate, power, precision and accuracy by application to a real dataset as well as simulated datasets.  相似文献   

16.
Genetic variances and covariances, summarized in G matrices, are key determinants of the course of adaptive evolution. Consequently, understanding how G matrices vary among populations is critical to answering a variety of questions in evolutionary biology. A method has recently been proposed for generating null distributions of statistics pertaining to differences in G matrices among populations. The general approach facilitated by this method is likely to prove to be very important in studies of the evolution of G . We have identified an issue in the method that will cause it to create null distributions of differences in G matrices that are likely to be far too narrow. The issue arises from the fact that the method as currently used generates null distributions of statistics pertaining to differences in G matrices across populations by simulating breeding value vectors based on G matrices estimated from data, randomizing these vectors across populations, and then calculating null values of statistics from G matrices that are calculated directly from the variances and covariances among randomized vectors. This calculation treats breeding values as quantities that are directly measurable, instead of predicted from G matrices that are themselves estimated from patterns of covariance among kin. The existing method thus neglects a major source of uncertainty in G matrices, which renders it anti‐conservative. We first suggest a correction to the method. We then apply the original and modified methods to a very simple instructive scenario. Finally, we demonstrate the use of both methods in the analysis of a real data set.  相似文献   

17.
We compared genetic variation and population differentiation at RFLP marker loci with seven quantitative characters including fungicide resistance, temperature sensitivity, pycnidial size, pycnidial density, colony size, percentage of leaves covered by pycnidia (PLACP) and percentage of leaves covered by lesions (PLACL) in Mycosphaerella graminicola populations sampled from four regions. Wide variation in population differentiation was found across the quantitative traits assayed. Fungicide resistance, temperature sensitivity, and PLACP displayed a significantly higher Q(ST) than G(ST), consistent with selection for local adaptation, while pycnidial size, pycnidial density and colony size displayed a lower or significantly lower Q(ST) than G(ST), consistent with constraining selection. There was not a statistical difference between Q(ST) and G(ST) in PLACL. We also found a positive and significant correlation between genetic variation in molecular marker loci and quantitative traits at the multitrait scale, suggesting that estimates of overall genetic variation for quantitative traits in M. graminicola could be derived from analysis of the molecular genetic markers.  相似文献   

18.
The statistical interpretation of the forensic genetic evidence requires the use of allelic frequency estimates in the reference population for the studied markers. Differences in the genetic make up of the populations can be reflected in statistically different allelic frequency distributions. One can easily figure out that collecting such information for any given population is not always possible. Therefore, alternative approaches are needed in these cases in order to compensate for the lack of information. A number of statistics have been proposed to control for population stratification in paternity testing and forensic casework, Fst correction being the only one recommended by the forensic community. In this study we aimed to evaluate the performance of Fst to correct for population stratification in forensics. By way of simulations, we first tested the dependence of Fst on the relative sizes of the sub-populations, and second, we measured the effect of the Fst corrections on the Paternity Index (PI) values compared to the ones obtained when using the local reference database. The results provide clear-cut evidence that (i) Fst values are strongly dependent on the sampling scheme, and therefore, for most situations it would be almost impossible to estimate real values of Fst; and (ii) Fst corrections might unfairly correct PI values for stratification, suggesting the use of local databases whenever possible to estimate the frequencies of genetic profiles and PI values.  相似文献   

19.
Comparing predicted breeding values (BV) among animals in different management units (e.g. flocks, herds) is challenging if units have different genetic means. Unbiased estimates of differences in BV may be obtained by assigning base animals to genetic groups according to their unit of origin, but units must be connected to estimate group effects. If many small groups exist, error of BV prediction may be increased. Alternatively, genetic groups can be excluded from the statistical model, which may bias BV predictions. If adequate genetic connections exist among units, bias is reduced. Several measures of connectedness have been proposed, but their relationships to potential bias in BV predictions are not well defined. This study compares alternative strategies to connect small units and assesses the ability of different connectedness statistics to quantify potential bias in BV prediction. Connections established using common sires across units were most effective in reducing bias. The coefficient of determination of the mean difference in predicted BV was a perfect indicator of potential bias remaining when comparing individuals in separate units. However, this measure is difficult to calculate; correlated measures such as prediction errors of differences in unit means and correlations among prediction errors are suggested as practical alternatives.  相似文献   

20.
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号