首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Under additive inheritance, the Henderson mixed model equations (HMME) provide an efficient approach to obtaining genetic evaluations by marker assisted best linear unbiased prediction (MABLUP) given pedigree relationships, trait and marker data. For large pedigrees with many missing markers, however, it is not feasible to calculate the exact gametic variance covariance matrix required to construct HMME. The objective of this study was to investigate the consequences of using approximate gametic variance covariance matrices on response to selection by MABLUP. Two methods were used to generate approximate variance covariance matrices. The first method (Method A) completely discards the marker information for individuals with an unknown linkage phase between two flanking markers. The second method (Method B) makes use of the marker information at only the most polymorphic marker locus for individuals with an unknown linkage phase. Data sets were simulated with and without missing marker data for flanking markers with 2, 4, 6, 8 or 12 alleles. Several missing marker data patterns were considered. The genetic variability explained by marked quantitative trait loci (MQTL) was modeled with one or two MQTL of equal effect. Response to selection by MABLUP using Method A or Method B were compared with that obtained by MABLUP using the exact genetic variance covariance matrix, which was estimated using 15 000 samples from the conditional distribution of genotypic values given the observed marker data. For the simulated conditions, the superiority of MABLUP over BLUP based only on pedigree relationships and trait data varied between 0.1% and 13.5% for Method A, between 1.7% and 23.8% for Method B, and between 7.6% and 28.9% for the exact method. The relative performance of the methods under investigation was not affected by the number of MQTL in the model.  相似文献   

2.
3.
M C Bink  J A Van Arendonk 《Genetics》1999,151(1):409-420
Augmentation of marker genotypes for ungenotyped individuals is implemented in a Bayesian approach via the use of Markov chain Monte Carlo techniques. Marker data on relatives and phenotypes are combined to compute conditional posterior probabilities for marker genotypes of ungenotyped individuals. The presented procedure allows the analysis of complex pedigrees with ungenotyped individuals to detect segregating quantitative trait loci (QTL). Allelic effects at the QTL were assumed to follow a normal distribution with a covariance matrix based on known QTL position and identity by descent probabilities derived from flanking markers. The Bayesian approach estimates variance due to the single QTL, together with polygenic and residual variance. The method was empirically tested through analyzing simulated data from a complex granddaughter design. Ungenotyped dams were related to one or more sons or grandsires in the design. Heterozygosity of the marker loci and size of QTL were varied. Simulation results indicated a significant increase in power when ungenotyped dams were included in the analysis.  相似文献   

4.
Pedigree-free animal models: the relatedness matrix reloaded   总被引:1,自引:0,他引:1  
Animal models typically require a known genetic pedigree to estimate quantitative genetic parameters. Here we test whether animal models can alternatively be based on estimates of relatedness derived entirely from molecular marker data. Our case study is the morphology of a wild bird population, for which we report estimates of the genetic variance-covariance matrices (G) of six morphological traits using three methods: the traditional animal model; a molecular marker-based approach to estimate heritability based on Ritland's pairwise regression method; and a new approach using a molecular genealogy arranged in a relatedness matrix (R) to replace the pedigree in an animal model. Using the traditional animal model, we found significant genetic variance for all six traits and positive genetic covariance among traits. The pairwise regression method did not return reliable estimates of quantitative genetic parameters in this population, with estimates of genetic variance and covariance typically being very small or negative. In contrast, we found mixed evidence for the use of the pedigree-free animal model. Similar to the pairwise regression method, the pedigree-free approach performed poorly when the full-rank R matrix based on the molecular genealogy was employed. However, performance improved substantially when we reduced the dimensionality of the R matrix in order to maximize the signal to noise ratio. Using reduced-rank R matrices generated estimates of genetic variance that were much closer to those from the traditional model. Nevertheless, this method was less reliable at estimating covariances, which were often estimated to be negative. Taken together, these results suggest that pedigree-free animal models can recover quantitative genetic information, although the signal remains relatively weak. It remains to be determined whether this problem can be overcome by the use of a more powerful battery of molecular markers and improved methods for reconstructing genealogies.  相似文献   

5.
Molecular markers allow to estimate the pairwise relatedness between the members of a breeding pool when their selection history is no longer available or has become too complex for a classical pedigree analysis. The field of population genetics has several estimation procedures at its disposal, but when the genotyped individuals are highly selected inbred lines, their application is not warranted as the theoretical assumptions on which these estimators were built, usually linkage equilibrium between marker loci or even Hardy–Weinberg equilibrium, are not met. An alternative approach requires the availability of a genotyped reference set of inbred lines, which allows to correct the observed marker similarities for their inherent upward bias when used as a coancestry measure. However, this approach does not guarantee that the resulting coancestry matrix is at least positive semi-definite (psd), a necessary condition for its use as a covariance matrix. In this paper we present the weighted alikeness in state (WAIS) estimator. This marker-based coancestry estimator is compared to several other commonly applied relatedness estimators under realistic hybrid breeding conditions in a number of simulations. We also fit a linear mixed model to phenotypical data from a commercial maize breeding programme and compare the likelihood of the different variance structures. WAIS is shown to be psd which makes it suitable for modelling the covariance between genetic components in linear mixed models involved in breeding value estimation or association studies. Results indicate that it generally produces a low root mean squared error under different breeding circumstances and provides a fit to the data that is comparable to that of several other marker-based alternatives. Recommendations for each of the examined coancestry measures are provided.  相似文献   

6.
Fan R  Jung J 《Human heredity》2002,54(3):132-150
In this paper, we extend association study methods of both Fan et al. [Hum Hered 2002;53:130-145], in which a quantitative trait locus (QTL) and a multi-allele marker are considered for trio families, and Fan and Xiong [Biostatistics 2003, in press], in which a QTL and a bi-allelic marker are considered for nuclear families. The objective is to build mixed models for association study between a QTL and a multi-allelic marker for nuclear families with any number of offspring. Two types of nuclear family data are considered: the first is genetic data of offspring from at least one heterozygous parents, and the second is genetic data of offspring of nuclear family. (1) For the data of offspring from at least one heterozygous parents, we assume that at least one parent is heterozygous at the marker locus, and we may infer clearly the transmission of parental marker alleles to the offspring. We show that it can be used in association study in the presence of linkage. The theoretical basis is the difference between the conditional mean of trait value given an allele is transmitted and the conditional mean of trait value given the allele is not transmitted from a heterozygous parent. To build valid models, we calculate the variance covariance structure of trait values of offspring. Besides, the reduction of the number of parameters is discussed under an assumption of tight linkage between the trait locus and the marker. (2) For the data of offspring of nuclear family, we show that it can be used in general association study. In this case, the theoretical basis is the difference between the conditional mean of trait values given an allele is transmitted from a parent and the population mean. Then, we calculate variance-covariance structure of trait values of offspring. (3) Based on the theoretical analysis, mixed models are built for each type of the data, and related test statistics are proposed for association study. By power calculation and comparison, we show that, in some instances, the proposed test statistics have higher power than that by collapsing alleles to be new ones. The proposed models are used to analyze chromosomes 4 and chromosome 16 data of the Oxford asthma data, Genetic Analysis Workshop 12.  相似文献   

7.
Variance component (VC) approaches based on restricted maximum likelihood (REML) have been used as an attractive method for positioning of quantitative trait loci (QTL). Linkage disequilibrium (LD) information can be easily implemented in the covariance structure among QTL effects (e.g. genotype relationship matrix) and mapping resolution appears to be high. Because of the use of LD information, the covariance structure becomes much richer and denser compared to the use of linkage information alone. This makes an average information (AI) REML algorithm based on mixed model equations and sparse matrix techniques less useful. In addition, (near-) singularity problems often occur with high marker densities, which is common in fine-mapping, causing numerical problems in AIREML based on mixed model equations. The present study investigates the direct use of the variance covariance matrix of all observations in AIREML for LD mapping with a general complex pedigree. The method presented is more efficient than the usual approach based on mixed model equations and robust to numerical problems caused by near-singularity due to closely linked markers. It is also feasible to fit multiple QTL simultaneously in the proposed method whereas this would drastically increase computing time when using mixed model equation-based methods.  相似文献   

8.
Huang YH  Lee MH  Chen WJ  Hsiao CK 《PloS one》2011,6(7):e21890
Haplotype association studies based on family genotype data can provide more biological information than single marker association studies. Difficulties arise, however, in the inference of haplotype phase determination and in haplotype transmission/non-transmission status. Incorporation of the uncertainty associated with haplotype inference into regression models requires special care. This task can get even more complicated when the genetic region contains a large number of haplotypes. To avoid the curse of dimensionality, we employ a clustering algorithm based on the evolutionary relationship among haplotypes and retain for regression analysis only the ancestral core haplotypes identified by it. To integrate the three sources of variation, phase ambiguity, transmission status and ancestral uncertainty, we propose an uncertainty-coding matrix which combines these three types of variability simultaneously. Next we evaluate haplotype risk with the use of such a matrix in a Bayesian conditional logistic regression model. Simulation studies and one application, a schizophrenia multiplex family study, are presented and the results are compared with those from other family based analysis tools such as FBAT. Our proposed method (Bayesian regression using uncertainty-coding matrix, BRUCM) is shown to perform better and the implementation in R is freely available.  相似文献   

9.
The transmission disequilibrium test (TDT) has been utilized to test the linkage and association between a genetic trait locus and a marker. Spielman et al. (1993) introduced TDT to test linkage between a qualitative trait and a marker in the presence of association. In the presence of linkage, TDT can be applied to test for association for fine mapping (Martin et al., 1997; Spielman and Ewens, 1996). In recent years, extensive research has been carried out on the TDT between a quantitative trait and a marker locus (Allison, 1997; Fan et al., 2002; George et al., 1999; Rabinowitz, 1997; Xiong et al., 1998; Zhu and Elston, 2000, 2001). The original TDT for both qualitative and quantitative traits requires unrelated offspring of heterozygous parents for analysis, and much research has been carried out to extend it to fit for different settings. For nuclear families with multiple offspring, one approach is to treat each child independently for analysis. Obviously, this may not be a valid method since offspring of one family are related to each other. Another approach is to select one offspring randomly from each family for analysis. However, with this method much information may be lost. Martin et al. (1997, 2000) constructed useful statistical tests to analyse the data for qualitative traits. In this paper, we propose to use mixed models to analyse sample data of nuclear families with multiple offspring for quantitative traits according to the models in Amos (1994). The method uses data of all offspring by taking into account their trait mean and variance-covariance structures, which contain all the effects of major gene locus, polygenic loci and environment. A test statistic based on mixed models is shown to be more powerful than the test statistic proposed by George et al. (1999) under moderate disequilibrium for nuclear families. Moreover, it has higher power than the TDT statistic which is constructed by randomly choosing a single offspring from each nuclear family.  相似文献   

10.
The modeling of generalized estimating equations used in the analysis of longitudinal data whether in continuous or discrete variables, necessarily requires the prior specification of a correlation matrix in its iterative process in order to obtain the estimates of the regression parameters. Such a matrix is called working correlation matrix and its incorrect specification produces less efficient estimates for the model parameters. Due to this fact, this study aims to propose a selection criterion of working correlation matrix based on the covariance matrix estimates of correlated responses resulting from the limiting values of the association parameter estimates. For validation of the criterion, we used simulation studies considering normal and binary correlated responses. Compared to some criteria in the literature, it was concluded that the proposed criterion resulted in a better performance when the correlation structure for exchangeable working correlation matrix was considered as true structure in the simulated samples and for large samples, the proposed criterion showed similar behavior to the other criteria, resulting in higher success rates.  相似文献   

11.
Genetic and phenotypic variance/covariance matrices are a fundamental measure of the amount of variation and the pattern of association among traits for current investigations in evolutionary biology. Still, few methods have been developed to accomplish the goal of pinpointing in which traits two matrices differ most, hampering further works on the field. We here described a novel method for dissecting matrix comparisons. This method is called Selection Response Decomposition and is an extension of the random skewers in the sense that evolutionary responses produced by known simulated selection vectors are unfolded and then compared in terms of the direct and indirect responses to selection for any trait. We also applied the method in diverse case studies, illustrating its potential. Both theoretical matrices and empirical biological data were used in the comparisons made. In the theoretical ones, the method was able to determine exactly which traits were responsible for the known a priori differences between the matrices, as well as where matrices remained similar to each other. Similar support could be observed in comparisons carried on between matrices produced from empirical biological data, since reasonable and detailed interpretations could be made regarding matrix comparisons. SRD represents an excellent tool for matrix comparisons and should provide quantitative evolutionary biology with a new method for analyzing and comparing variance/covariance patterns.  相似文献   

12.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to adjust for multiple comparisons when studying brain‐phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF)‐based tools can have inflated family‐wise error rates (FWERs). This has led to substantial controversy as to which processing choices are necessary to control the FWER using GRF‐based SEI. The failure of GRF‐based methods is due to unrealistic assumptions about the spatial covariance function of the imaging data. A permutation procedure is the most robust SEI tool because it estimates the spatial covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi‐) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the spatial covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use the methods to study the association between performance and executive functioning in a working memory functional magnetic resonance imaging study. The sPBJ has similar or greater power to the PBJ and permutation procedures while maintaining the nominal type 1 error rate in reasonable sample sizes. We provide an R package to perform inference using the PBJ and sPBJ procedures.  相似文献   

13.
Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well if the data exhibits heavy-tailedness or outliers. To address this challenge, a new robust FPCA approach based on a functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced. We propose robust estimation procedures for eigenfunctions and eigenvalues. Theoretical properties of the PASS operator are established, showing that it adopts the same eigenfunctions as the standard covariance operator and also allows recovering ratios between eigenvalues. We also extend the proposed procedure to handle functional data measured with noise. Compared to existing robust FPCA approaches, the proposed PASS FPCA requires weaker distributional assumptions to conserve the eigenspace of the covariance function. Specifically, existing work are often built upon a class of functional elliptical distributions, which requires inherently symmetry. In contrast, we introduce a class of distributions called the weakly functional coordinate symmetry (weakly FCS), which allows for severe asymmetry and is much more flexible than the functional elliptical distribution family. The robustness of the PASS FPCA is demonstrated via extensive simulation studies, especially its advantages in scenarios with nonelliptical distributions. The proposed method was motivated by and applied to analysis of accelerometry data from the Objective Physical Activity and Cardiovascular Health Study, a large-scale epidemiological study to investigate the relationship between objectively measured physical activity and cardiovascular health among older women.  相似文献   

14.
Since metabolome data are derived from the underlying metabolic network, reverse engineering of such data to recover the network topology is of wide interest. Lyapunov equation puts a constraint to the link between data and network by coupling the covariance of data with the strength of interactions (Jacobian matrix). This equation, when expressed as a linear set of equations at steady state, constitutes a basis to infer the network structure given the covariance matrix of data. The sparse structure of metabolic networks points to reactions which are active based on minimal enzyme production, hinting at sparsity as a cellular objective. Therefore, for a given covariance matrix, we solved Lyapunov equation to calculate Jacobian matrix by a simultaneous use of minimization of Euclidean norm of residuals and maximization of sparsity (the number of zeros in Jacobian matrix) as objective functions to infer directed small-scale networks from three kingdoms of life (bacteria, fungi, mammalian). The inference performance of the approach was found to be promising, with zero False Positive Rate, and almost one True positive Rate. The effect of missing data on results was additionally analyzed, revealing superiority over similarity-based approaches which infer undirected networks. Our findings suggest that the covariance of metabolome data implies an underlying network with sparsest pattern. The theoretical analysis forms a framework for further investigation of sparsity-based inference of metabolic networks from real metabolome data.  相似文献   

15.
Robust PCA and classification in biosciences   总被引:7,自引:0,他引:7  
MOTIVATION: Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good results in the presence of outlying measurements. RESULTS: First, we propose a robust PCA (ROBPCA) method for high-dimensional data. It combines projection-pursuit ideas with robust estimation of low-dimensional data. We also propose a diagnostic plot to display and classify the outliers. This ROBPCA method is applied to several bio-chemical datasets. In one example, we also apply a robust discriminant method on the scores obtained with ROBPCA. We show that this combination of robust methods leads to better classifications than classical PCA and quadratic discriminant analysis. AVAILABILITY: All the programs are part of the Matlab Toolbox for Robust Calibration, available at http://www.wis.kuleuven.ac.be/stat/robust.html.  相似文献   

16.
A novel multitrait fine-mapping method is presented. The method is implemented by a model that treats QTL effects as random variables. The covariance matrix of allelic effects is proportional to the IBD matrix, where each element is the probability that a pair of alleles is identical by descent, given marker information and QTL position. These probabilities are calculated on the basis of similarities of marker haplotypes of individuals of the first generation of genotyped individuals, using "gene dropping" (linkage disequilibrium) and transmission of markers from genotyped parents to genotyped offspring (linkage). A small simulation study based on a granddaughter design was carried out to illustrate that the method provides accurate estimates of QTL position. Results from the simulation also indicate that it is possible to distinguish between a model postulating one pleiotropic QTL affecting two traits vs. one postulating two closely linked loci, each affecting one of the traits.  相似文献   

17.

Introduction

Virtually all existing expectation-maximization (EM) algorithms for quantitative trait locus (QTL) mapping overlook the covariance structure of genetic effects, even though this information can help enhance the robustness of model-based inferences.

Results

Here, we propose fast EM and pseudo-EM-based procedures for Bayesian shrinkage analysis of QTLs, designed to accommodate the posterior covariance structure of genetic effects through a block-updating scheme. That is, updating all genetic effects simultaneously through many cycles of iterations.

Conclusion

Simulation results based on computer-generated and real-world marker data demonstrated the ability of our method to swiftly produce sensible results regarding the phenotype-to-genotype association. Our new method provides a robust and remarkably fast alternative to full Bayesian estimation in high-dimensional models where the computational burden associated with Markov chain Monte Carlo simulation is often unwieldy. The R code used to fit the model to the data is provided in the online supplementary material.  相似文献   

18.
There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.  相似文献   

19.
Linkage analysis based on identity-by-descent allele-sharing can be used to identify a chromosomal region harboring a quantitative trait locus (QTL), but lacks the resolution required for gene identification. Consequently, linkage disequilibrium (association) analysis is often employed for fine-mapping. Variance-components based combined linkage and association analysis for quantitative traits in sib pairs, in which association is modeled as a mean effect and linkage is modeled in the covariance structure has been extended to general pedigrees (quantitative transmission disequilibrium test, QTDT). The QTDT approach accommodates data not only from parents and siblings, but also from all available relatives. QTDT is also robust to population stratification. However, when population stratification is absent, it is possible to utilize even more information, namely the additional information contained in the founder genotypes. In this paper, we introduce a simple modification of the allelic transmission scoring method used in the QTDT that results in a more powerful test of linkage disequilibrium, but is only applicable in the absence of population stratification. This test, the quantitative trait linkage disequilibrium (QTLD) test, has been incorporated into a new procedure in the statistical genetics computer package SOLAR. We apply this procedure in a linkage/association analysis of an electrophysiological measurement previously shown to be related to alcoholism. We also demonstrate by simulation the increase in power obtained with the QTLD test, relative to the QTDT, when a true association exists between a marker and a QTL.  相似文献   

20.
Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is suboptimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings are common in modern genomics, where covariance matrix estimation is frequently employed as a method for inferring gene networks. To achieve estimation accuracy in these settings, existing methods typically either assume that the population covariance matrix has some particular structure, for example, sparsity, or apply shrinkage to better estimate the population eigenvalues. In this paper, we study a new approach to estimating high-dimensional covariance matrices. We first frame covariance matrix estimation as a compound decision problem. This motivates defining a class of decision rules and using a nonparametric empirical Bayes g-modeling approach to estimate the optimal rule in the class. Simulation results and gene network inference in an RNA-seq experiment in mouse show that our approach is comparable to or can outperform a number of state-of-the-art proposals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号