期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The full EM algorithm for the MLEs of QTL effects and positions and their estimated variances in multiple-interval mapping

Chen Z 《Biometrics》2005,61(2):474-480

The advent of complete genetic linkage maps of DNA markers has made systematic studies of mapping quantitative trait loci (QTL) in experimental organisms feasible. The method of multiple-interval mapping provides an appropriate way for mapping QTL using genetic markers. However, efficient algorithms for the computation involved remain to be developed. In this article, a full EM algorithm for the simultaneous computation of the MLEs of QTL effects and positions is developed. EM-based formulas are derived for computing the observed Fisher information matrix. The full EM algorithm is compared with an ECM algorithm developed by Kao and Zeng (1997, Biometrics 53, 653-665). The validity of the inverted observed Fisher information matrix as an estimate of the variance matrix of the MLEs is demonstrated by a simulation study. 相似文献

2.

A multivariate model for ordinal trait analysis

Xu S Xu C 《Heredity》2006,97(6):409-417

Many economically important characteristics of agricultural crops are measured as ordinal traits. Statistical analysis of the genetic basis of ordinal traits appears to be quite different from regular quantitative traits. The generalized linear model methodology implemented via the Newton-Raphson algorithm offers improved efficiency in the analysis of such data, but does not take full advantage of the extensive theory developed in the linear model arena. Instead, we develop a multivariate model for ordinal trait analysis and implement an EM algorithm for parameter estimation. We also propose a method for calculating the variance-covariance matrix of the estimated parameters. The EM equations turn out to be extremely similar to formulae seen in standard linear model analysis. Computer simulations are performed to validate the EM algorithm. A real data set is analyzed to demonstrate the application of the method. The advantages of the EM algorithm over other methods are addressed. Application of the method to QTL mapping for ordinal traits is demonstrated using a simulated baclcross (BC) population. 相似文献

3.

An EM algorithm for mapping binary disease loci: application to fibrosarcoma in a four-way cross mouse family

Xu S Yi N Burke D Galecki A Miller RA 《Genetical research》2003,82(2):127-138

Many diseases show dichotomous phenotypic variation but do not follow a simple Mendelian pattern of inheritance. Variances of these binary diseases are presumably controlled by multiple loci and environmental variants. A least-squares method has been developed for mapping such complex disease loci by treating the binary phenotypes (0 and 1) as if they were continuous. However, the least-squares method is not recommended because of its ad hoc nature. Maximum Likelihood (ML) and Bayesian methods have also been developed for binary disease mapping by incorporating the discrete nature of the phenotypic distribution. In the ML analysis, the likelihood function is usually maximized using some complicated maximization algorithms (e.g. the Newton-Raphson or the simplex algorithm). Under the threshold model of binary disease, we develop an Expectation Maximization (EM) algorithm to solve for the maximum likelihood estimates (MLEs). The new EM algorithm is developed by treating both the unobserved genotype and the disease liability as missing values. As a result, the EM iteration equations have the same form as the normal equation system in linear regression. The EM algorithm is further modified to take into account sexual dimorphism in the linkage maps. Applying the EM-implemented ML method to a four-way-cross mouse family, we detected two regions on the fourth chromosome that have evidence of QTLs controlling the segregation of fibrosarcoma, a form of connective tissue cancer. The two QTLs explain 50-60% of the variance in the disease liability. We also applied a Bayesian method previously developed (modified to take into account sex-specific maps) to this data set and detected one additional QTL on chromosome 13 that explains another 26% of the variance of the disease liability. All the QTLs detected primarily show dominance effects. 相似文献

4.

A general framework for analyzing the genetic architecture of developmental characteristics 总被引：8，自引：0，他引：8

Wu R Ma CX Lin M Casella G 《Genetics》2004,166(3):1541-1551

The genetic architecture of growth traits plays a central role in shaping the growth, development, and evolution of organisms. While a limited number of models have been devised to estimate genetic effects on complex phenotypes, no model has been available to examine how gene actions and interactions alter the ontogenetic development of an organism and transform the altered ontogeny into descendants. In this article, we present a novel statistical model for mapping quantitative trait loci (QTL) determining the developmental process of complex traits. Our model is constructed within the traditional maximum-likelihood framework implemented with the EM algorithm. We employ biologically meaningful growth curve equations to model time-specific expected genetic values and the AR(1) model to structure the residual variance-covariance matrix among different time points. Because of a reduced number of parameters being estimated and the incorporation of biological principles, the new model displays increased statistical power to detect QTL exerting an effect on the shape of ontogenetic growth and development. The model allows for the tests of a number of biological hypotheses regarding the role of epistasis in determining biological growth, form, and shape and for the resolution of developmental problems at the interface with evolution. Using our newly developed model, we have successfully detected significant additive x additive epistatic effects on stem height growth trajectories in a forest tree. 相似文献

5.

Estimation of dominance components in noninbred populations by using additive animal model residuals

Chalh A El Gazzah M 《Journal of applied genetics》2002,43(4):471-488

In the case of noninbred and unselected populations with linkage equilibrium, the additive and dominance genetic effects are uncorrelated and the variance-covariance matrix of the second component is simply a product of its variance by a matrix that can be computed from the numerator relationship matrix A. The aim of this study is to present a new approach to estimate the dominance part with a reduced set of equations and hence a lower computing cost. The method proposed is based on the processing of the residual terms resulting from the BLUP methodology applied to an additive animal model. Best linear unbiased prediction of the dominance component d is almost identical to the one given by the full mixed model equations. Based on this approach, an algorithm for restricted maximum likelihood (REML) estimation of the variance components is also presented. By way of illustration, two numerical examples are given and a comparison between the parameters estimated with the expectation maximization (EM) algorithm and those obtained by the proposed algorithm is made. The proposed algorithm is iterative and yields estimates that are close to those obtained by EM, which is also iterative. 相似文献

6.

Genome-wide evaluation for quantitative trait loci under the variance component model

Lide Han Shizhong Xu 《Genetica》2010,138(9-10):1099-1109

The identity-by-descent (IBD) based variance component analysis is an important method for mapping quantitative trait loci (QTL) in outbred populations. The interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic variances of the entire genome because they require evaluation of multiple models and model selection. In this study, we developed a multiple variance component model for genome-wide evaluation using both the maximum likelihood (ML) method and the MCMC implemented Bayesian method. We placed one QTL in every few cM on the entire genome and estimated the QTL variances and positions simultaneously in a single model. Genomic regions that have no QTL usually showed no evidence of QTL while regions with large QTL always showed strong evidence of QTL. While the Bayesian method produced the optimal result, the ML method is computationally more efficient than the Bayesian method. Simulation experiments were conducted to demonstrate the efficacy of the new methods. 相似文献

7.

Multivariate whole genome average interval mapping: QTL analysis for multiple traits and/or environments 总被引：1，自引：0，他引：1

Verbyla AP Cullis BR 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,125(5):933-953

A major aim in some plant-based studies is the determination of quantitative trait loci (QTL) for multiple traits or across multiple environments. Understanding these QTL by trait or QTL by environment interactions can be of great value to the plant breeder. A whole genome approach for the analysis of QTL is presented for such multivariate applications. The approach is an extension of whole genome average interval mapping in which all intervals on a linkage map are included in the analysis simultaneously. A random effects working model is proposed for the multivariate (trait or environment) QTL effects for each interval, with a variance-covariance matrix linking the variates in a particular interval. The significance of the variance-covariance matrix for the QTL effects is tested and if significant, an outlier detection technique is used to select a putative QTL. This QTL by variate interaction is transferred to the fixed effects. The process is repeated until the variance-covariance matrix for QTL random effects is not significant; at this point all putative QTL have been selected. Unlinked markers can also be included in the analysis. A simulation study was conducted to examine the performance of the approach and demonstrated the multivariate approach results in increased power for detecting QTL in comparison to univariate methods. The approach is illustrated for data arising from experiments involving two doubled haploid populations. The first involves analysis of two wheat traits, α-amylase activity and height, while the second is concerned with a multi-environment trial for extensibility of flour dough. The method provides an approach for multi-trait and multi-environment QTL analysis in the presence of non-genetic sources of variation. 相似文献

8.

Employing a Monte Carlo Algorithm in Newton-Type Methods for Restricted Maximum Likelihood Estimation of Genetic Parameters

Kaarina Matilainen Esa A. M?ntysaari Martin H. Lidauer Ismo Strandén Robin Thompson 《PloS one》2013,8(12)

Estimation of variance components by Monte Carlo (MC) expectation maximization (EM) restricted maximum likelihood (REML) is computationally efficient for large data sets and complex linear mixed effects models. However, efficiency may be lost due to the need for a large number of iterations of the EM algorithm. To decrease the computing time we explored the use of faster converging Newton-type algorithms within MC REML implementations. The implemented algorithms were: MC Newton-Raphson (NR), where the information matrix was generated via sampling; MC average information(AI), where the information was computed as an average of observed and expected information; and MC Broyden''s method, where the zero of the gradient was searched using a quasi-Newton-type algorithm. Performance of these algorithms was evaluated using simulated data. The final estimates were in good agreement with corresponding analytical ones. MC NR REML and MC AI REML enhanced convergence compared to MC EM REML and gave standard errors for the estimates as a by-product. MC NR REML required a larger number of MC samples, while each MC AI REML iteration demanded extra solving of mixed model equations by the number of parameters to be estimated. MC Broyden''s method required the largest number of MC samples with our small data and did not give standard errors for the parameters directly. We studied the performance of three different convergence criteria for the MC AI REML algorithm. Our results indicate the importance of defining a suitable convergence criterion and critical value in order to obtain an efficient Newton-type method utilizing a MC algorithm. Overall, use of a MC algorithm with Newton-type methods proved feasible and the results encourage testing of these methods with different kinds of large-scale problem settings. 相似文献

9.

A general probabilistic model for group independent component analysis and its estimation methods

Guo Y 《Biometrics》2011,67(4):1532-1542

Independent component analysis (ICA) has become an important tool for analyzing data from functional magnetic resonance imaging (fMRI) studies. ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a prespecified group design matrix and the uncertainty in between-subjects variability in fMRI data. We present a general probabilistic ICA (PICA) model that can accommodate varying group structures of multisubject spatiotemporal processes. An advantage of the proposed model is that it can flexibly model various types of group structures in different underlying neural source signals and under different experimental conditions in fMRI studies. A maximum likelihood (ML) method is used for estimating this general group ICA model. We propose two expectation-maximization (EM) algorithms to obtain the ML estimates. The first method is an exact EM algorithm, which provides an exact E-step and an explicit noniterative M-step. The second method is a variational approximation EM algorithm, which is computationally more efficient than the exact EM. In simulation studies, we first compare the performance of the proposed general group PICA model and the existing probabilistic group ICA approach. We then compare the two proposed EM algorithms and show the variational approximation EM achieves comparable accuracy to the exact EM with significantly less computation time. An fMRI data example is used to illustrate application of the proposed methods. 相似文献

10.

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Daowen Zhang Jie Lena Sun Karen Pieper 《Statistics in biosciences》2016,8(2):220-233

Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOSs where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results. 相似文献

11.

Estimating the correlation of pairwise relatedness along chromosomes

Hu XS 《Heredity》2005,94(3):338-346

The 'spatial' pattern of the correlation of pairwise relatedness among loci within a chromosome is an important aspect for an insight into genomic evolution in natural populations. In this article, a statistical genetic method is presented for estimating the correlation of pairwise relatedness among linked loci. The probabilities of identity-in-state (IIS) are related to the probabilities of identity-by-descent (IBS) for the two- and three-loci cases. By decomposing the joint probabilities of two- or three-loci IBD, the probability of pairwise relatedness at a single locus and its correlation among linked loci can be simultaneously estimated. To provide effective statistical methods for estimation, weighted least square (LS) and maximum likelihood (ML) methods are evaluated through extensive Monte Carlo simulations. Results show that the ML method gives a better performance than the weighted LS method with haploid genotypic data. However, there are no significant differences between the two methods when two- or three-loci diploid genotypic data are employed. Compared with the optimal size for haploid genotypic data, a smaller optimal sample size is predicted with diploid genotypic data. 相似文献

12.

On the differences between maximum likelihood and regression interval mapping in the analysis of quantitative trait loci 总被引：10，自引：0，他引：10

Kao CH 《Genetics》2000,156(2):855-865

The differences between maximum-likelihood (ML) and regression (REG) interval mapping in the analysis of quantitative trait loci (QTL) are investigated analytically and numerically by simulation. The analytical investigation is based on the comparison of the solution sets of the ML and REG methods in the estimation of QTL parameters. Their differences are found to relate to the similarity between the conditional posterior and conditional probabilities of QTL genotypes and depend on several factors, such as the proportion of variance explained by QTL, relative QTL position in an interval, interval size, difference between the sizes of QTL, epistasis, and linkage between QTL. The differences in mean squared error (MSE) of the estimates, likelihood-ratio test (LRT) statistics in testing parameters, and power of QTL detection between the two methods become larger as (1) the proportion of variance explained by QTL becomes higher, (2) the QTL locations are positioned toward the middle of intervals, (3) the QTL are located in wider marker intervals, (4) epistasis between QTL is stronger, (5) the difference between QTL effects becomes larger, and (6) the positions of QTL get closer in QTL mapping. The REG method is biased in the estimation of the proportion of variance explained by QTL, and it may have a serious problem in detecting closely linked QTL when compared to the ML method. In general, the differences between the two methods may be minor, but can be significant when QTL interact or are closely linked. The ML method tends to be more powerful and to give estimates with smaller MSEs and larger LRT statistics. This implies that ML interval mapping can be more accurate, precise, and powerful than REG interval mapping. The REG method is faster in computation, especially when the number of QTL considered in the model is large. Recognizing the factors affecting the differences between REG and ML interval mapping can help an efficient strategy, using both methods in QTL mapping to be outlined. 相似文献

13.

Mapping quantitative trait loci underlying triploid endosperm traits 总被引：18，自引：0，他引：18

Xu C He X Xu S 《Heredity》2003,90(3):228-235

Endosperm, which is derived from two polar nuclei fusing with one sperm, is a triploid tissue in cereals. Endosperm tissue determines the grain quality of cereals. Improving grain quality is one of the important breeding objectives in cereals. However, current statistical methods for mapping quantitative trait loci (QTL) under diploid genetic control have not been effective for dealing with endosperm traits because of the complexity of their triploid inheritance. In this paper, we derive for the first time the conditional probabilities of F(3) endosperm QTL genotypes given different flanking marker genotypes in F(2) plants. Using these probabilities, we develop a multiple linear regression method implemented via the iteratively reweighted least-squares (IRWLS) algorithm and a maximum likelihood method (ML) implemented via the expectation-maximization (EM) algorithm to map QTL underlying endosperm traits. We use the mean value of endosperm traits of F(3) seeds as the dependent variable and the expectations of genotypic indicators for additive and dominance effect of a putative QTL flanked by a pair of markers as independent variables for IRWLS mapping. However, if an endosperm trait is measured quantitatively using a single endosperm sample, the ML mapping method can be used to separate the two dominance effects. Efficiency of the methods is verified through extensive Monte Carlo simulation studies. Results of simulation show that the proposed methods provide accurate estimates of both the QTL effects and locations with very high statistical power. With these methods, we are now ready to map endosperm traits, as we can for regular quantitative trait under diploid control. 相似文献

14.

An expectation and maximization algorithm for estimating Q?×?E interaction effects

Zhao F Xu S 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,124(8):1375-1387

A Markov chain Monte Carlo (MCMC) implemented Bayesian method has been developed to detect quantitative trait loci (QTL) effects and Q × E interaction effects. However, the MCMC algorithm is time consuming due to repeated samplings of QTL parameters. We developed an expectation and maximization (EM) algorithm as an alternative method for detecting QTL and Q × E interaction. Simulation studies and real data analysis showed that the EM algorithm produced comparable result as the Bayesian method, but with a speed many magnitudes faster than the MCMC algorithm. We used the EM algorithm to analyze a well known barley dataset produced by the North American Barley Genome Mapping Project. The dataset contained eight quantitative traits collected from 150 doubled-haploid (DH) lines evaluated in multiple environments. Each line was genotyped for 495 polymorphic markers. The result showed that all eight traits exhibited QTL main effects and Q × E interaction effects. On average, the main effects and Q × E interaction effects contributed 34.56 and 16.23% of the total phenotypic variance, respectively. Furthermore, we found that whether or not a locus shows Q × E interaction does not depend on the presence of main effect. 相似文献

15.

An improved procedure of mapping a quantitative trait locus via the EM algorithm using posterior probabilities

Saurabh Ghosh Partha P. Majumder 《Journal of genetics》2000,79(2):47-53

Mapping a locus controlling a quantitative genetic trait (e.g. blood pressure) to a specific genomic region is of considerable contemporary interest. Data on the quantitative trait under consideration and several codominant genetic markers with known genomic locations are collected from members of families and statistically analysed to estimate the recombination fraction, θ, between the putative quantitative trait locus and a genetic marker. One of the major complications in estimating θ for a quantitative trait in humans is the lack of haplotype information on members of families. We have devised a computationally simple two-stage method of estimation of θ in the absence of haplotypic information using the expectation-maximization (EM) algorithm. In the first stage, parameters of the quantitative trait locus (QTL) are estimated on the basis of data of a sample of unrelated individuals and a Bayes’s rule is used to classify each parent into a QTL genotypic class. In the second stage, we have proposed an EM algorithm for obtaining the maximum-likelihood estimate of θ based on data of informative families (which are identified upon inferring parental QTL genotypes performed in the first stage). The purpose of this paper is to investigate whether, instead of using genotypically ‘classified’ data of parents, the use of posterior probabilities of QT genotypes of parents at the second stage yields better estimators. We show, using simulated data, that the proposed procedure using posterior probabilities is statistically more efficient than our earlier classification procedure, although it is computationally heavier. 相似文献

16.

Defining the assumptions underlying modeling of epistatic QTL using variance component methods

Rönnegård L Pong-Wong R Carlborg O 《The Journal of heredity》2008,99(4):421-425

Variance component models are commonly used to detect quantitative trait loci (QTL) in general pedigrees. The variance-covariance structure of the random QTL effect is given by the identity by descent (IBD) between genotypes. Epistatic effects have previously been modeled, both for unlinked and linked loci, as a random effect with a variance-covariance structure given by the Hadamard product between the IBD matrices of the direct QTL effects. In the original papers, the model was given but not derived. Here, we identify the underlying assumptions of this previously proposed model. It assumes that either an unlinked QTL or a fully informative marker (i.e., all marker alleles are unique in the base generation) is located between the loci. We discuss the need of developing a general algorithm to estimate the variance-covariance structure of the random epistatic effect for linked loci. 相似文献

17.

Multiple trait multiple interval mapping of quantitative trait loci from inbred line crosses

LD E Silva S Wang ZB Zeng 《BMC genetics》2012,13(1):67

ABSTRACT: BACKGROUND: Although many experiments have measurements on multiple traits, most studies performed the analysis of mapping of quantitative trait loci (QTL) for each trait separately using single trait analysis. Single trait analysis does not take advantage of possible genetic and environmental correlations between traits. In this paper, we propose a novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL for inbred line crosses. We also develop a novel score-based method for estimating genome-wide significance level of putative QTL effects suitable for the MTMIM model. The MTMIM method is implemented in the freely available and widely used Windows QTL Cartographer software. RESULTS: Throughout the paper, we provide compelling empirical evidences that: (1) the score-based threshold maintains proper type I error rate and tends to keep false discovery rate within an acceptable level; (2) the MTMIM method can deliver better parameter estimates and power than single trait multiple interval mapping method; (3) an analysis of Drosophila dataset illustrates how the MTMIM method can better extract information from datasets with measurements in multiple traits. CONCLUSIONS: The MTMIM method represents a convenient statistical framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariance matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices, therefore, providing more details on the genetic architecture of complex traits. 相似文献

18.

Maximum likelihood identification of neural point process systems

E. S. Chornoboy L. P. Schramm A. F. Karr 《Biological cybernetics》1988,59(4-5):265-275

Using the theory of random point processes, a method is presented whereby functional relationships between neurons can be detected and modeled. The method is based on a point process characterization involving stochastic intensities and an additive rate function model. Estimates are based on the maximum likelihood (ML) principle and asymptotic properties are examined in the absence of a stationarity assumption. An iterative algorithm that computes the ML estimates is presented. It is based on the expectation/maximization (EM) procedure of Dempster et al. (1977) and makes ML identification accessible to models requiring many parameters. Examples illustrating the use of the method are also presented. These examples are derived from simulations of simple neural systems that cannot be identified using correlation techniques. It is shown that the ML method correctly identifies each of these systems. 相似文献

19.

A Fast EM Algorithm for BayesA-Like Prediction of Genomic Breeding Values

Xiaochen Sun Long Qu Dorian J. Garrick Jack C. M. Dekkers Rohan L. Fernando 《PloS one》2012,7(11)

Prediction accuracies of estimated breeding values for economically important traits are expected to benefit from genomic information. Single nucleotide polymorphism (SNP) panels used in genomic prediction are increasing in density, but the Markov Chain Monte Carlo (MCMC) estimation of SNP effects can be quite time consuming or slow to converge when a large number of SNPs are fitted simultaneously in a linear mixed model. Here we present an EM algorithm (termed “fastBayesA”) without MCMC. This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects. In each EM iteration, SNP effects are predicted as a linear combination of best linear unbiased predictions of breeding values from a mixed linear animal model that incorporates a weighted marker-based realized relationship matrix. Method fastBayesA converges after a few iterations to a joint posterior mode of SNP effects under the BayesA model. When applied to simulated quantitative traits with a range of genetic architectures, fastBayesA is shown to predict GEBV as accurately as BayesA but with less computing effort per SNP than BayesA. Method fastBayesA can be used as a computationally efficient substitute for BayesA, especially when an increasing number of markers bring unreasonable computational burden or slow convergence to MCMC approaches. 相似文献

20.

Correcting the Bias in Estimation of Genetic Variances Contributed by Individual QTL

Luo L Mao Y Xu S 《Genetica》2003,119(2):107-114

In addition to locating chromosomal positions of quantitative trait loci (QTL), estimating the sizes of identified QTL is also an important component in QTL mapping. The size of a QTL is usually measured by the proportion of the phenotypic variance contributed by the QTL. However, the genetic variance may be overestimated in a small line crossing experiment. In this study, we investigate this bias and develop a simple method to correct the bias. The bias correction, however, requires the error of the estimated genetic effect, which is not trivial if the genetic effect is estimated using the Expectation and Maximization (EM) algorithm. Therefore, we also develop a simple method to estimate the standard error of the estimated genetic effect, which is subsequently used to correct the bias in the variance estimate. 相似文献