首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
High-resolution mapping is an important step in the identification of complex disease genes. In outbred populations, linkage disequilibrium is expected to operate over short distances and could provide a powerful fine-mapping tool. Here we build on recently developed methods for linkage-disequilibrium mapping of quantitative traits to construct a general approach that can accommodate nuclear families of any size, with or without parental information. Variance components are used to construct a test that utilizes information from all available offspring but that is not biased in the presence of linkage or familiality. A permutation test is described for situations in which maximum-likelihood estimates of the variance components are biased. Simulation studies are used to investigate power and error rates of this approach and to highlight situations in which violations of multivariate normality assumptions warrant the permutation test. The relationship between power and the level of linkage disequilibrium for this test suggests that the method is well suited to the analysis of dense maps. The relationship between power and family structure is investigated, and these results are applicable to study design in complex disease, especially for late-onset conditions for which parents are usually not available. When parental genotypes are available, power does not depend greatly on the number of offspring in each family. Power decreases when parental genotypes are not available, but the loss in power is negligible when four or more offspring per family are genotyped. Finally, it is shown that, when siblings are available, the total number of genotypes required in order to achieve comparable power is smaller if parents are not genotyped.  相似文献   

2.
Regional association analysis is one of the most powerful tools for gene mapping because instead analysis of individual variants it simultaneously considers all variants in the region. Recent development of the models for regional association analysis involves functional data analysis approach. In the framework of this approach, genotypes of variants within region as well as their effects are described by continuous functions. Such approach allows us to use information about both linkage and linkage disequilibrium and reduce the influence of noise and/or observation errors. Here we define a functional linear mixed model to test association on independent and structured samples. We demonstrate how to test fixed and random effects of a set of genetic variants in the region on quantitative trait. Estimation of statistical properties of new methods shows that type I errors are in accordance with declared values and power is high especially for models with fixed effects of genotypes. We suppose that new functional regression linear models facilitate identification of rare genetic variants controlling complex human and animal traits. New methods are implemented in computer software FREGAT which is available for free download at http://mga.bionet.nsc.ru/soft/FREGAT/.  相似文献   

3.
Summary Population admixture can be a confounding factor in genetic association studies. Family‐based methods ( Rabinowitz and Larid, 2000 , Human Heredity 50, 211–223) have been proposed in both testing and estimation settings to adjust for this confounding, especially in case‐only association studies. The family‐based methods rely on conditioning on the observed parental genotypes or on the minimal sufficient statistic for the genetic model under the null hypothesis. In some cases, these methods do not capture all the available information due to the conditioning strategy being too stringent. General efficient methods to adjust for population admixture that use all the available information have been proposed ( Rabinowitz, 2002 , Journal of the American Statistical Association 92, 742–758). However these approaches may not be easy to implement in some situations. A previously developed easy‐to‐compute approach adjusts for admixture by adding supplemental covariates to linear models ( Yang et al., 2000 , Human Heredity 50, 227–233). Here is shown that this augmenting linear model with appropriate covariates strategy can be combined with the general efficient methods in Rabinowitz (2002) to provide computationally tractable and locally efficient adjustment. After deriving the optimal covariates, the adjusted analysis can be carried out using standard statistical software packages such as SAS or R . The proposed methods enjoy a local efficiency in a neighborhood of the true model. The simulation studies show that nontrivial efficiency gains can be obtained by using information not accessible to the methods that rely on conditioning on the minimal sufficient statistics. The approaches are illustrated through an analysis of the influence of apolipoprotein E (APOE) genotype on plasma low‐density lipoprotein (LDL) concentration in children.  相似文献   

4.
A combined logistic regression and life-table analysis is presented on age-at-onset data for Huntington disease. Covariates included in the analysis were sex of the at-risk individual, parental age at onset, and sex of transmitting parent. Parental age at onset and parental sex were found to be significant covariates for age at onset in the offspring, and the appropriate logistic regression functions are calculated by maximum likelihood methods. These regression functions permit a more precise evaluation of carrier risks and likelihoods than hitherto was possible by simple computational means. We further introduce a novel method to account for sibship correlations in the significance assessment, using log-likelihood differences between different models.  相似文献   

5.
The analysis of family-study data sometimes focuses on whether a dichotomous trait tends to cluster in families. For traits with variable age-at-onset, it may be of interest to investigate whether age-at-onset itself also exhibits familial clustering. A complication in such investigations is that censoring by age-at-ascertainment can induce artifactual familial correlation in the age-at-onset of affected members. A further complication can be that sample inclusion criteria involve the affection status of family members. The purpose here is to present an approach to testing for correlation that is not confounded by censoring by age-at-ascertainment and may be applied with a broad range of inclusion criteria. The approach involves regression statistics in which subjects's covariate terms are chosen to reflect age-at-onset information from the subjects's affected family members. The results of analyses of data from a family-study of panic disorder illustrate the approach.  相似文献   

6.
We consider the effect of informative missingness on association tests that use parental genotypes as controls and that allow for missing parental data. Parental data can be informatively missing when the probability of a parent being available for study is related to that parent's genotype; when this occurs, the distribution of genotypes among observed parents is not representative of the distribution of genotypes among the missing parents. Many previously proposed procedures that allow for missing parental data assume that these distributions are the same. We propose association tests that behave well when parental data are informatively missing, under the assumption that, for a given trio of paternal, maternal, and affected offspring genotypes, the genotypes of the parents and the sex of the missing parents, but not the genotype of the affected offspring, can affect parental missingness. (This same assumption is required for validity of an analysis that ignores incomplete parent-offspring trios.) We use simulations to compare our approach with previously proposed procedures, and we show that if even small amounts of informative missingness are not taken into account, they can have large, deleterious effects on the performance of tests.  相似文献   

7.
The central issue for Genetic Analysis Workshop 14 (GAW14) is the question, which is the better strategy for linkage analysis, the use of single-nucleotide polymorphisms (SNPs) or microsatellite markers? To answer this question we analyzed the simulated data using Duffy's SIB-PAIR program, which can incorporate parental genotypes, and our identity-by-state – identity-by-descent (IBS-IBD) transformation method of affected sib-pair linkage analysis which uses the matrix transformation between IBS and IBD. The advantages of our method are as follows: the assumption of Hardy-Weinberg equilibrium is not necessary; the parental genotype information maybe all unknown; both IBS and its related IBD transformation can be used in the linkage analysis; the determinant of the IBS-IBD transformation matrix provides a quantitative measure of the quality of the marker in linkage analysis. With the originally distributed simulated data, we found that 1) for microsatellite markers there are virtually no differences in types I and II error rates when parental genotypes were or were not used; 2) on average, a microsatellite marker has more power than a SNP marker does in linkage detection; 3) if parental genotype information is used, SNP markers show lower type I error rates than microsatellite markers; and 4) if parental genotypes are not available, SNP markers show considerable variation in type I error rates for different methods.  相似文献   

8.
Zou G  Pan D  Zhao H 《Genetics》2003,164(3):1161-1173
The identification of genotyping errors is an important issue in mapping complex disease genes. Although it is common practice to genotype multiple markers in a candidate region in genetic studies, the potential benefit of jointly analyzing multiple markers to detect genotyping errors has not been investigated. In this article, we discuss genotyping error detections for a set of tightly linked markers in nuclear families, and the objective is to identify families likely to have genotyping errors at one or more markers. We make use of the fact that recombination is a very unlikely event among these markers. We first show that, with family trios, no extra information can be gained by jointly analyzing markers if no phase information is available, and error detection rates are usually low if Mendelian consistency is used as the only standard for checking errors. However, for nuclear families with more than one child, error detection rates can be greatly increased with the consideration of more markers. Error detection rates also increase with the number of children in each family. Because families displaying Mendelian consistency may still have genotyping errors, we calculate the probability that a family displaying Mendelian consistency has correct genotypes. These probabilities can help identify families that, although showing Mendelian consistency, may have genotyping errors. In addition, we examine the benefit of available haplotype frequencies in the general population on genotyping error detections. We show that both error detection rates and the probability that an observed family displaying Mendelian consistency has correct genotypes can be greatly increased when such additional information is available.  相似文献   

9.
10.
Various family-based association methods have recently been proposed that allow testing for linkage in the presence of linkage disequilibrium between a marker and a disease even if there is only incomplete parental-genotype information. For some families, it may be possible to reconstruct missing parental genotypes from the genotypes of their offspring. Treating such a reconstructed family as if parental genotypes have been typed, however, can introduce bias. The reconstruction-combined transmission/disequilibrium test (RC-TDT) and its X-chromosomal counterpart, XRC-TDT, employ parental-genotype reconstruction and correct for the biases involved in this reconstruction without relying on population marker allele frequencies. For the two tests, exact P values can be obtained by numerically calculating the convolution of the null distributions corresponding to the families in the sample.  相似文献   

11.
Missing data occur in genetic association studies for several reasons including missing family members and uncertain haplotype phase. Maximum likelihood is a commonly used approach to accommodate missing data, but it can be difficult to apply to family-based association studies, because of possible loss of robustness to confounding by population stratification. Here a novel likelihood for nuclear families is proposed, in which distinct sets of association parameters are used to model the parental genotypes and the offspring genotypes. This approach is robust to population structure when the data are complete, and has only minor loss of robustness when there are missing data. It also allows a novel conditioning step that gives valid analysis for multiple offspring in the presence of linkage. Unrelated subjects are included by regarding them as the children of two missing parents. Simulations and theory indicate similar operating characteristics to TRANSMIT, but with no bias with missing data in the presence of linkage. In comparison with FBAT and PCPH, the proposed model is slightly less robust to population structure but has greater power to detect strong effects. In comparison to APL and MITDT, the model is more robust to stratification and can accommodate sibships of any size. The methods are implemented for binary and continuous traits in software, UNPHASED, available from the author.  相似文献   

12.
Researchers conducting family-based association studies have a wide variety of transmission/disequilibrium (TD)-based methods to choose from, but few guidelines exist in the selection of a particular method to apply to available data. Using a simulation study design, we compared the power and type I error of eight popular TD-based methods under different family structures, frequencies of missing parental data, genetic models, and population stratifications. No method was uniformly most powerful under all conditions, but type I error was appropriate for nearly every test statistic under all conditions. Power varied widely across methods, with a 46.5% difference in power observed between the most powerful and the least powerful method when 50% of families consisted of an affected sib pair and one parent genotyped under an additive genetic model and a 35.2% difference when 50% of families consisted of a single affection-discordant sibling pair without parental genotypes available under an additive genetic model. Methods were generally robust to population stratification, although some slightly less so than others. The choice of a TD-based test statistic should be dependent on the predominant family structure ascertained, the frequency of missing parental genotypes, and the assumed genetic model.  相似文献   

13.
A stepwise logistic-regression procedure is proposed for evaluation of the relative importance of variants at different sites within a small genetic region. By fitting statistical models with main effects, rather than modeling the full haplotype effects, we generate tests, with few degrees of freedom, that are likely to be powerful for detecting primary etiological determinants. The approach is applicable to either case/control or nuclear-family data, with case/control data modeled via unconditional and family data via conditional logistic regression. Four different conditioning strategies are proposed for evaluation of effects at multiple, closely linked loci when family data are used. The first strategy results in a likelihood that is equivalent to analysis of a matched case/control study with each affected offspring matched to three pseudocontrols, whereas the second strategy is equivalent to matching each affected offspring with between one and three pseudocontrols. Both of these strategies require you be able to infer parental phase (i.e., those haplotypes present in the parents). Families in which phase cannot be determined must be discarded, which can considerably reduce the effective size of a data set, particularly when large numbers of loci that are not very polymorphic are being considered. Therefore, a third strategy is proposed in which knowledge of parental phase is not required, which allows those families with ambiguous phase to be included in the analysis. The fourth and final strategy is to use conditioning method 2 when parental phase can be inferred and to use conditioning method 3 otherwise. The methods are illustrated using nuclear-family data to evaluate the contribution of loci in the HLA region to the development of type 1 diabetes.  相似文献   

14.
In spite of the potential evolutionary importance of parental effects, many aspects of these effects remain inadequately explained. This paper explores both their causes and potential consequences for the evolution of life-history traits in plants. In a growth chamber experiment, I manipulated the pre- and postzygotic temperatures of both parents of controlled crosses of Plantago lanceolata. All offspring traits were affected by parental temperature. On average, low parental temperature increased seed weight, reduced germination and offspring growth rate, and accelerated onset of reproduction by 7%, 50%, 5%, and 47%, respectively, when compared to the effects of high parental temperature. Both pre- and postzygotic parental temperatures (i.e., prior to fertilization vs. during fertilization and seed set, respectively) influenced offspring traits but not always in the same direction. In all cases, however, the postzygotic effect was stronger. The prezygotic effects were more often transmitted paternally than maternally. Growth and onset of reproduction were influenced both directly by parental temperature as well as indirectly via the effects of parental temperature on seed weight and germination. Significant interactions between parental genotypes and prezygotic temperature treatment (G × E interactions) show that genotypes differ in their intergenerational responses to temperature with respect to germination and growth. The data suggest that temperature is involved in both genetically based and environmentally induced parental effects and that parental temperature may accelerate the rate of evolutionary change in flowering time in natural populations of P. lanceolata. The environmentally induced temperature effects, as mediated through G × (prezygotic) E interactions are not likely to affect the rate or direction of evolutionary change in the traits examined because postzygotic temperature effects greatly exceed prezygotic effects.  相似文献   

15.
Many traits of evolutionary interest, when placed in their developmental, physiological, or environmental contexts, are function-valued. For instance, gene expression during development is typically a function of the age of an organism and physiological processes are often a function of environment. In comparative and experimental studies, a fundamental question is whether the function-valued trait of one group is different from another. To address this question, evolutionary biologists have several statistical methods available. These methods can be classified into one of two types: multivariate and functional. Multivariate methods, including univariate repeated-measures analysis of variance (ANOVA), treat each trait as a finite list of data. Functional methods, such as repeated-measures regression, view the data as a sample of points drawn from an underlying function. A key difference between multivariate and functional methods is that functional methods retain information about the ordering and spacing of a set of data values, information that is discarded by multivariate methods. In this study, we evaluated the importance of that discarded information in statistical analyses of function-valued traits. Our results indicate that functional methods tend to have substantially greater statistical power than multivariate approaches to detect differences in a function-valued trait between groups.  相似文献   

16.
On marker-assisted prediction of genetic value: beyond the ridge   总被引:6,自引:0,他引:6  
Gianola D  Perez-Enciso M  Toro MA 《Genetics》2003,163(1):347-365
Marked-assisted genetic improvement of agricultural species exploits statistical dependencies in the joint distribution of marker genotypes and quantitative traits. An issue is how molecular (e.g., dense marker maps) and phenotypic information (e.g., some measure of yield in plants) is to be used for predicting the genetic value of candidates for selection. Multiple regression, selection index techniques, best linear unbiased prediction, and ridge regression of phenotypes on marker genotypes have been suggested, as well as more elaborate methods. Here, phenotype-marker associations are modeled hierarchically via multilevel models including chromosomal effects, a spatial covariance of marked effects within chromosomes, background genetic variability, and family heterogeneity. Lorenz curves and Gini coefficients are suggested for assessing the inequality of the contribution of different marked effects to genetic variability. Classical and Bayesian methods are presented. The Bayesian approach includes a Markov chain Monte Carlo implementation. The generality and flexibility of the Bayesian method is illustrated when a Lorenz curve is to be inferred.  相似文献   

17.
The uptake of genomic selection (GS) by the swine industry is still limited by the costs of genotyping. A feasible alternative to overcome this challenge is to genotype animals using an affordable low-density (LD) single nucleotide polymorphism (SNP) chip panel followed by accurate imputation to a high-density panel. Therefore, the main objective of this study was to screen incremental densities of LD panels in order to systematically identify one that balances the tradeoffs among imputation accuracy, prediction accuracy of genomic estimated breeding values (GEBVs), and genotype density (directly associated with genotyping costs). Genotypes using the Illumina Porcine60K BeadChip were available for 1378 Duroc (DU), 2361 Landrace (LA) and 3192 Yorkshire (YO) pigs. In addition, pseudo-phenotypes (de-regressed estimated breeding values) for five economically important traits were provided for the analysis. The reference population for genotyping imputation consisted of 931 DU, 1631 LA and 2103 YO animals and the remainder individuals were included in the validation population of each breed. A LD panel of 3000 evenly spaced SNPs (LD3K) yielded high imputation accuracy rates: 93.78% (DU), 97.07% (LA) and 97.00% (YO) and high correlations (>0.97) between the predicted GEBVs using the actual 60 K SNP genotypes and the imputed 60 K SNP genotypes for all traits and breeds. The imputation accuracy was influenced by the reference population size as well as the amount of parental genotype information available in the reference population. However, parental genotype information became less important when the LD panel had at least 3000 SNPs. The correlation of the GEBVs directly increased with an increase in imputation accuracy. When genotype information for both parents was available, a panel of 300 SNPs (imputed to 60 K) yielded GEBV predictions highly correlated (⩾0.90) with genomic predictions obtained based on the true 60 K panel, for all traits and breeds. For a small reference population size with no parents on reference population, it is recommended the use of a panel at least as dense as the LD3K and, when there are two parents in the reference population, a panel as small as the LD300 might be a feasible option. These findings are of great importance for the development of LD panels for swine in order to reduce genotyping costs, increase the uptake of GS and, therefore, optimize the profitability of the swine industry.  相似文献   

18.
Summary .  We propose a similarity-based regression method to detect associations between traits and multimarker genotypes. The model regresses similarity in traits for pairs of "unrelated" individuals on their haplotype similarities, and detects the significance by a score test for which the limiting distribution is derived. The proposed method allows for covariates, uses phase-independent similarity measures to bypass the needs to impute phase information, and is applicable to traits of general types (e.g., quantitative and qualitative traits). We also show that the gene-trait similarity regression is closely connected with random effects haplotype analysis, although commonly they are considered as separate modeling tools. This connection unites the classic haplotype sharing methods with the variance-component approaches, which enables direct derivation of analytical properties of the sharing statistics even when the similarity regression model becomes analytically challenging.  相似文献   

19.
Summary We derive regression estimators that can compare longitudinal treatments using only the longitudinal propensity scores as regressors. These estimators, which assume knowledge of the variables used in the treatment assignment, are important for reducing the large dimension of covariates for two reasons. First, if the regression models on the longitudinal propensity scores are correct, then our estimators share advantages of correctly specified model‐based estimators, a benefit not shared by estimators based on weights alone. Second, if the models are incorrect, the misspecification can be more easily limited through model checking than with models based on the full covariates. Thus, our estimators can also be better when used in place of the regression on the full covariates. We use our methods to compare longitudinal treatments for type II diabetes mellitus.  相似文献   

20.
David L. Remington 《Genetics》2009,181(3):1087-1099
The use of high-throughput genomic techniques to map gene expression quantitative trait loci has spurred the development of path analysis approaches for predicting functional networks linking genes and natural trait variation. The goal of this study was to test whether potentially confounding factors, including effects of common environment and genes not included in path models, affect predictions of cause–effect relationships among traits generated by QTL path analyses. Structural equation modeling (SEM) was used to test simple QTL-trait networks under different regulatory scenarios involving direct and indirect effects. SEM identified the correct models under simple scenarios, but when common-environment effects were simulated in conjunction with direct QTL effects on traits, they were poorly distinguished from indirect effects, leading to false support for indirect models. Application of SEM to loblolly pine QTL data provided support for biologically plausible a priori hypotheses of QTL mechanisms affecting height and diameter growth. However, some biologically implausible models were also well supported. The results emphasize the need to include any available functional information, including predictions for genetic and environmental correlations, to develop plausible models if biologically useful trait network predictions are to be made.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号