首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Various family-based association methods have recently been proposed that allow testing for linkage in the presence of linkage disequilibrium between a marker and a disease even if there is only incomplete parental-genotype information. For some families, it may be possible to reconstruct missing parental genotypes from the genotypes of their offspring. Treating such a reconstructed family as if parental genotypes have been typed, however, can introduce bias. The reconstruction-combined transmission/disequilibrium test (RC-TDT) and its X-chromosomal counterpart, XRC-TDT, employ parental-genotype reconstruction and correct for the biases involved in this reconstruction without relying on population marker allele frequencies. For the two tests, exact P values can be obtained by numerically calculating the convolution of the null distributions corresponding to the families in the sample.  相似文献   

2.
Once genetic linkage has been identified for a complex disease, the next step is often association analysis, in which single-nucleotide polymorphisms (SNPs) within the linkage region are genotyped and tested for association with the disease. If a SNP shows evidence of association, it is useful to know whether the linkage result can be explained, in part or in full, by the candidate SNP. We propose a novel approach that quantifies the degree of linkage disequilibrium (LD) between the candidate SNP and the putative disease locus through joint modeling of linkage and association. We describe a simple likelihood of the marker data conditional on the trait data for a sample of affected sib pairs, with disease penetrances and disease-SNP haplotype frequencies as parameters. We estimate model parameters by maximum likelihood and propose two likelihood-ratio tests to characterize the relationship of the candidate SNP and the disease locus. The first test assesses whether the candidate SNP and the disease locus are in linkage equilibrium so that the SNP plays no causal role in the linkage signal. The second test assesses whether the candidate SNP and the disease locus are in complete LD so that the SNP or a marker in complete LD with it may account fully for the linkage signal. Our method also yields a genetic model that includes parameter estimates for disease-SNP haplotype frequencies and the degree of disease-SNP LD. Our method provides a new tool for detecting linkage and association and can be extended to study designs that include unaffected family members.  相似文献   

3.
OBJECTIVES: Confidence intervals for genotype relative risks, for allele frequencies and for the attributable risk in the case parent trio design for candidate-gene studies are proposed which can be easily calculated from the observed familial genotype frequencies. METHODS: Likelihood theory and the delta method were used to derive point estimates and confidence internals. We used Monte Carlo simulations to show the validity of the formulae for a variety of given modes of inheritance and allele frequencies and illustrated their usefulness by applying them to real data. RESULTS: Generally these formulae were found to be valid for 'sufficiently large' sample sizes. For smaller sample sizes the estimators for genotype relative risks tended to be conservative whereas the estimator for attributable risk was found to be anti-conservative for moderate to high allele frequencies. CONCLUSIONS: Since the proposed formulae provide quantitative information on the individual and epidemiological relevance of a genetic variant they might be a useful addition to the traditional statistical significance level of TDT results.  相似文献   

4.
The recent large genotyping studies have identified a new repertoire of disease susceptibility loci of unknown function, characterized by high allele frequencies and low relative risks, lending support to the common disease-common variant (CDCV) hypothesis. The variants explain a much larger proportion of the disease etiology, measured by the population attributable fraction, than of the familial risk. We show here that if the identified polymorphisms were markers of rarer functional alleles they would explain a much larger proportion of the familial risk. For example, in a plausible scenario where the marker is 10 times more common than the causative allele, the excess familial risk of the causative allele is over 10 times higher than that of the marker allele. However, the population attributable fractions of the two alleles are equal. The penetrance mode of the causative locus may be very difficult to deduce from the apparent penetrance mode of the marker locus.  相似文献   

5.
The coverage probabilities of several confidence limit estimators of genetic parameters, obtained from North Carolina I designs, were assessed by means of Monte Carlo simulations. The reliability of the estimators was compared under three different parental sample sizes. The coverage of confidence intervals set on the Normal distribution, and using standard errors either computed by the “delta” method or derived using an approximation for the variance of a variance component estimated by means of a linear combination of mean squares, was affected by the number of males and females included in the experiment. The “delta” method was found to provide reliable standard errors of the genetic parameters only when at least 48 males were each mated to six different females randomly selected from the reference population. Formulae are provided for obtaining “delta” method standard errors, and appropriate statistical software procedures are discussed. The error rates of confidence limits based on the Normal distribution and using standard errors obtained by an approximation for the variance of a variance component varied widely. The coverage of F-distribution confidence intervals for heritability estimates was not significantly affected by parental sample size and consistently provided a mean coverage near the stated coverage. For small parental sample sizes, confidence intervals for heritability estimates should be based on the F-distribution.  相似文献   

6.
Anderson AD  Weir BS 《Genetics》2007,176(1):421-440
A maximum-likelihood estimator for pairwise relatedness is presented for the situation in which the individuals under consideration come from a large outbred subpopulation of the population for which allele frequencies are known. We demonstrate via simulations that a variety of commonly used estimators that do not take this kind of misspecification of allele frequencies into account will systematically overestimate the degree of relatedness between two individuals from a subpopulation. A maximum-likelihood estimator that includes F(ST) as a parameter is introduced with the goal of producing the relatedness estimates that would have been obtained if the subpopulation allele frequencies had been known. This estimator is shown to work quite well, even when the value of F(ST) is misspecified. Bootstrap confidence intervals are also examined and shown to exhibit close to nominal coverage when F(ST) is correctly specified.  相似文献   

7.
Lu Xia  Bin Nan  Yi Li 《Biometrics》2023,79(1):344-357
Modeling and drawing inference on the joint associations between single-nucleotide polymorphisms and a disease has sparked interest in genome-wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the debiased lasso approach (van de Geer et al., 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this “large n, diverging p” scenario, we propose an alternative debiased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original debiased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposed refined debiased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large-scale hospital-based epidemiology cohort study investigating the joint effects of genetic variants on lung cancer risks.  相似文献   

8.
Population stratification is a form of confounding by ethnicity that may cause bias to effect estimates and inflate test statistics in genetic association studies. Unlinked genetic markers have been used to adjust for test statistics, but their use in correcting biased effect estimates has not been addressed. We evaluated the potential of bias correction that could be achieved by a single null marker (M) in studies involving one candidate gene (G). When the distribution of M varied greatly across ethnicities, controlling for M in a logistic regression model substantially reduced biases on odds ratio estimates. When M had same distributions as G across ethnicities, biases were further reduced or eliminated by subtracting the regression coefficient of M from the coefficient of G in the model, which was fitted either with or without a multiplicative interaction term between M and G. Correction of bias due to population stratification depended specifically on the distributions of G and M, the difference between baseline disease risks across ethnicities, and whether G had an effect on disease risk or not. Our results suggested that marker choice and the specific treatment of that marker in analysis greatly influenced bias correction.  相似文献   

9.
The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the Rh2 criteria of Stram et al. [1]) the differences between the three methods are very small and in particular that the single imputation method may be expected to work extremely well.  相似文献   

10.
Studies of genetics and ecology often require estimates of relatedness coefficients based on genetic marker data. However, with the presence of null alleles, an observed genotype can represent one of several possible true genotypes. This results in biased estimates of relatedness. As the numbers of marker loci are often limited, loci with null alleles cannot be abandoned without substantial loss of statistical power. Here, we show how loci with null alleles can be incorporated into six estimators of relatedness (two novel). We evaluate the performance of various estimators before and after correction for null alleles. If the frequency of a null allele is <0.1, some estimators can be used directly without adjustment; if it is >0.5, the potency of estimation is too low and such a locus should be excluded. We make available a software package entitled PolyRelatedness v1.6, which enables researchers to optimize these estimators to best fit a particular data set.  相似文献   

11.
J Benichou  M H Gail 《Biometrics》1990,46(4):991-1003
The attributable risk (AR), defined as AR = [Pr(disease) - Pr(disease/no exposure)]/Pr(disease), measures the proportion of disease risk that is attributable to an exposure. Recently Bruzzi et al. (1985, American Journal of Epidemiology 122, 904-914) presented point estimates of AR based on logistic models for case-control data to allow for confounding factors and secondary exposures. To produce confidence intervals, we derived variance estimates for AR under the logistic model and for various designs for sampling controls. Calculations for discrete exposure and confounding factors require covariances between estimates of the risk parameters of the logistic model and the proportions of cases with given levels of exposure and confounding factors. These covariances are estimated from Taylor series expansions applied to implicit functions. Similar calculations for continuous exposures are derived using influence functions. Simulations indicate that those asymptotic procedures yield reliable variance estimates and confidence intervals with near nominal coverage. An example illustrates the usefulness of variance calculations in selecting a logistic model that is neither so simplified as to exhibit systematic lack of fit nor so complicated as to inflate the variance of the estimate of AR.  相似文献   

12.
Standard theory provides a simple prediction for the frequency of a recessive lethal allele conferring heterozygous protection against an infectious disease (the best-known example being sickle cell protection against malaria). This relationship allows historic disease mortality rates to be estimated. There are, however, hidden biases in this approach. Reproductively active human females in archaic societies normally produce children at intervals of around 4 years. If death of the fetus or young infant (less than around 3 years of age) occurs, then the mother re-enters oestrus and produces another child. This 'reproductive compensation' reduces selection against the agent causing early mortality (the recessive allele or infective agent) and biases our estimates of historic mortality rates. The magnitude of these biases is investigated. Re-conception also constitutes a demographic selective pressure acting alongside natural selection: lethal genetic diseases (or tightly linked loci) will be selected to become ever more virulent, killing at ever decreasing ages, to allow the mother to re-enter oestrus and re-conceive a (hopefully unaffected) sibling; this effect also invalidates statistical tests using the number of alleles to distinguish overdominance from drift as explanations for high allele frequency. The same bias affects calculations of mutation/selection balance: for any given mutation rate, syndromes which kill early in life will reach much higher frequencies than those killing at later ages. An intriguing consequence is that lethal recessive disorders in humans will increase in frequency by up to 45% as a consequence of the recent demographic transition to planned family size.  相似文献   

13.
Deng HW  Li YM  Li MX  Liu PY 《Human heredity》2003,56(4):160-165
Hardy-Weinberg disequilibrium (HWD) measures have been proposed using dense markers to fine map a quantitative trait locus (QTL) to regions < approximately 1 cM. Earlier HWD measures may introduce bias in the fine mapping because they are dependent on marker allele frequencies across loci. Hence, HWD indices that do not depend on marker allele frequencies are desired for fine mapping. Based on our earlier work, here we present four new HWD indices that do not depend on marker allele frequencies. Two are for use when marker allele frequencies in a study population are known, and two are for use when marker allele frequencies in a study population are not known and are only known in the extreme samples. The new measures are a function of the genetic distance between the marker locus and a QTL. Through simulations, we investigated and compared the fine mapping performance of the new HWD measures with that of the earlier ones. Our results show that when marker allele frequencies vary across loci, the new measures presented here are more robust and powerful.  相似文献   

14.
Recent admixture between genetically differentiated populations can result in high levels of association between alleles at loci that are <=10 cM apart. The transmission/disequilibrium test (TDT) proposed by Spielman et al. (1993) can be a powerful test of linkage between disease and marker loci in the presence of association and therefore could be a useful test of linkage in admixed populations. The degree of association between alleles at two loci depends on the differences in allele frequencies, at the two loci, in the founding populations; therefore, the choice of marker is important. For a multiallelic marker, one strategy that may improve the power of the TDT is to group marker alleles within a locus, on the basis of information about the founding populations and the admixed population, thereby collapsing the marker into one with fewer alleles. We have examined the consequences of collapsing a microsatellite into a two-allele marker, when two founding populations are assumed for the admixed population, and have found that if there is random mating in the admixed population, then typically there is a collapsing for which the power of the TDT is greater than that for the original microsatellite marker. A method is presented for finding the optimal collapsing that has minimal dependence on the disease and that uses estimates either of marker allele frequencies in the two founding populations or of marker allele frequencies in the current, admixed population and in one of the founding populations. Furthermore, this optimal collapsing is not always the collapsing with the largest difference in allele frequencies in the founding populations. To demonstrate this strategy, we considered a recent data set, published previously, that provides frequency estimates for 30 microsatellites in 13 populations.  相似文献   

15.
The risk difference is an intelligible measure for comparing disease incidence in two exposure or treatment groups. Despite its convenience in interpretation, it is less prevalent in epidemiological and clinical areas where regression models are required in order to adjust for confounding. One major barrier to its popularity is that standard linear binomial or Poisson regression models can provide estimated probabilities out of the range of (0,1), resulting in possible convergence issues. For estimating adjusted risk differences, we propose a general framework covering various constraint approaches based on binomial and Poisson regression models. The proposed methods span the areas of ordinary least squares, maximum likelihood estimation, and Bayesian inference. Compared to existing approaches, our methods prevent estimates and confidence intervals of predicted probabilities from falling out of the valid range. Through extensive simulation studies, we demonstrate that the proposed methods solve the issue of having estimates or confidence limits of predicted probabilities out of (0,1), while offering performance comparable to its alternative in terms of the bias, variability, and coverage rates in point and interval estimation of the risk difference. An application study is performed using data from the Prospective Registry Evaluating Myocardial Infarction: Event and Recovery (PREMIER) study.  相似文献   

16.
Next Generation Sequencing (NGS) has revolutionized biomedical research in recent years. It is now commonly used to identify rare variants through resequencing individual genomes. Due to the cost of NGS, researchers have considered pooling samples as a cost-effective alternative to individual sequencing. In this article, we consider the estimation of allele frequencies of rare variants through the NGS technologies with pooled DNA samples with or without barcodes. We consider three methods for estimating allele frequencies from such data, including raw sequencing counts, inferred genotypes, and expected minor allele counts, and compare their performance. Our simulation results suggest that the estimator based on inferred genotypes overall performs better than or as well as the other two estimators. When the sequencing coverage is low, biases and MSEs can be sensitive to the choice of the prior probabilities of genotypes for the estimators based on inferred genotypes and expected minor allele counts so that more accurate specification of prior probabilities is critical to lower biases and MSEs. Our study shows that the optimal number of barcodes in a pool is relatively robust to the frequencies of rare variants at a specific coverage depth. We provide general guidelines on using DNA pooling with barcoding for the estimation of allele frequencies of rare variants.  相似文献   

17.
A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distributed individuals. Ascertainment bias is the deviation, from what would be observed in a random sample, caused either by discovery of polymorphisms in small samples or by locus selection based on levels or patterns of polymorphism. The three SNP surveys from which the present data were derived differ both in their protocols for ascertainment and in the size of the samples used for discovery. We implemented a Monte Carlo maximum-likelihood method to fit a subdivided-population model that includes a possible change in effective size at some time in the past. Incorrectly assuming that ascertainment bias does not exist causes errors in inference, affecting both estimates of migration rates and historical changes in size. Migration rates are overestimated when ascertainment bias is ignored. However, the direction of error in inferences about changes in effective population size (whether the population is inferred to be shrinking or growing) depends on whether either the numbers of SNPs per fragment or the SNP-allele frequencies are analyzed. We use the abbreviation "SDL," for "SNP-discovered locus," in recognition of the genomic-discovery context of SNPs. When ascertainment bias is modeled fully, both the number of SNPs per SDL and their allele frequencies support a scenario of growth in effective size in the context of a subdivided population. If subdivision is ignored, however, the hypothesis of constant effective population size cannot be rejected. An important conclusion of this work is that, in demographic or other studies, SNP data are useful only to the extent that their ascertainment can be modeled.  相似文献   

18.
Genetic Analysis Workshop 14 simulated data have been analyzed with MASC(marker association segregation chi-squares) in which we implemented a bootstrap procedure to provide the variation intervals of parameter estimates. We model here the effect of a genetic factor, S, for Kofendrerd Personality Disorder in the region of the marker C03R0281 for the Aipotu population. The goodness of fit of several genetic models with two alleles for one locus has been tested. The data are not compatible with a direct effect of a single-nucleotide polymorphism (SNP) (SNP 16, 17, 18, 19 of pack 153) in the region. Therefore, we can conclude that the functional polymorphism has not been typed and is in linkage disequilibrium with the four studied SNPs. We obtained very large variation intervals both of the disease allele frequency and the degree of dominance. The uncertainty of the model parameters can be explained first, by the method used, which models marginal effects when the disease is due to complex interactions, second, by the presence of different sub-criteria used for the diagnosis that are not determined by S in the same way, and third, by the fact that the segregation of the disease in the families was not taken into account. However, we could not find any model that could explain the familial segregation of the trait, namely the higher proportion of affected parents than affected sibs.  相似文献   

19.
Previous studies have noted that the estimated positions of a large proportion of mapped quantitative trait loci (QTLs) coincide with marker locations and have suggested that this indicates a bias in the mapping methodology. In this study we predict the expected proportion of QTLs with positions estimated to be at the location of a marker and further examine the problem using simulated data. The results show that the higher proportion of putative QTLs estimated to be at marker positions compared with non-marker positions is an expected consequence of the estimation methods. The study initially focused on a single interval with no QTLs and was extended to include multiple intervals and QTLs of large effect. Further, the study demonstrated that the larger proportion of estimated QTL positions at the location of markers was not unique to linear regression mapping. Maximum likelihood produced similar results, although the accumulation of positional estimates at outermost markers was reduced when regions outside the linkage group were also considered. The bias towards marker positions is greatest under the null hypothesis of no QTLs or when QTL effects are small. This study discusses the impact the findings could have on the calculation of thresholds and confidence intervals produced by bootstrap methods.  相似文献   

20.
We assessed complementary log–log (CLL) regression as an alternative statistical model for estimating multivariable‐adjusted prevalence ratios (PR) and their confidence intervals. Using the delta method, we derived an expression for approximating the variance of the PR estimated using CLL regression. Then, using simulated data, we examined the performance of CLL regression in terms of the accuracy of the PR estimates, the width of the confidence intervals, and the empirical coverage probability, and compared it with results obtained from log–binomial regression and stratified Mantel–Haenszel analysis. Within the range of values of our simulated data, CLL regression performed well, with only slight bias of point estimates of the PR and good confidence interval coverage. In addition, and importantly, the computational algorithm did not have the convergence problems occasionally exhibited by log–binomial regression. The technique is easy to implement in SAS (SAS Institute, Cary, NC), and it does not have the theoretical and practical issues associated with competing approaches. CLL regression is an alternative method of binomial regression that warrants further assessment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号