期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Generalized linear model for interval mapping of quantitative trait loci

Shizhong Xu Zhiqiu Hu 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2010,121(1):47-63

We developed a generalized linear model of QTL mapping for discrete traits in line crossing experiments. Parameter estimation was achieved using two different algorithms, a mixture model-based EM (expectation–maximization) algorithm and a GEE (generalized estimating equation) algorithm under a heterogeneous residual variance model. The methods were developed using ordinal data, binary data, binomial data and Poisson data as examples. Applications of the methods to simulated as well as real data are presented. The two different algorithms were compared in the data analyses. In most situations, the two algorithms were indistinguishable, but when large QTL are located in large marker intervals, the mixture model-based EM algorithm can fail to converge to the correct solutions. Both algorithms were coded in C++ and interfaced with SAS as a user-defined SAS procedure called PROC QTL. 相似文献

2.

Generalized Linear Model for Mapping Discrete Trait Loci Implemented with LASSO Algorithm

Jun Xing Huijiang Gao Yang Wu Yani Wu Hongwang Li Runqing Yang 《PloS one》2014,9(9)

Generalized estimating equation (GEE) algorithm under a heterogeneous residual variance model is an extension of the iteratively reweighted least squares (IRLS) method for continuous traits to discrete traits. In contrast to mixture model-based expectation–maximization (EM) algorithm, the GEE algorithm can well detect quantitative trait locus (QTL), especially large effect QTLs located in large marker intervals in the manner of high computing speed. Based on a single QTL model, however, the GEE algorithm has very limited statistical power to detect multiple QTLs because of ignoring other linked QTLs. In this study, the fast least absolute shrinkage and selection operator (LASSO) is derived for generalized linear model (GLM) with all possible link functions. Under a heterogeneous residual variance model, the LASSO for GLM is used to iteratively estimate the non-zero genetic effects of those loci over entire genome. The iteratively reweighted LASSO is therefore extended to mapping QTL for discrete traits, such as ordinal, binary, and Poisson traits. The simulated and real data analyses are conducted to demonstrate the efficiency of the proposed method to simultaneously identify multiple QTLs for binary and Poisson traits as examples. 相似文献

3.

Regression modeling of ordinal data with nonzero baselines

Xie M Simpson DG 《Biometrics》1999,55(1):308-316

This paper develops regression models for ordinal data with nonzero control response probabilities. The models are especially useful in dose-response studies where the spontaneous or natural response rate is nonnegligible and the dosage is logarithmic. These models generalize Abbott's formula, which has been commonly used to model binary data with nonzero background observations. We describe a biologically plausible latent structure and develop an EM algorithm for fitting the models. The EM algorithm can be implemented using standard software for ordinal regression. A toxicology data set where the proposed model fits the data but a more conventional model fails is used to illustrate the methodology. 相似文献

4.

An EM algorithm for mapping quantitative resistance loci

Xu C Zhang YM Xu S 《Heredity》2005,94(1):119-128

Many disease resistance traits in plants have a polygenic background and the disease phenotypes are modified by environmental factors. As a consequence, the phenotypic values usually show a quantitative variation. The phenotypes of such disease traits, however, are often measured in discrete but ordered categories. These traits are called ordinal traits. In terms of disease resistance, they are called quantitative resistance traits, as opposed to qualitative resistance traits, and are controlled by the quantitative resistance loci (QRL). Classical quantitative trait locus mapping methods are not optimal for ordinal trait analysis because the assumption of normal distribution is violated. Methods for mapping binary trait loci are not suitable either because there are more than two categories in ordinal traits. We developed a maximum likelihood method to map these QRL. The method is implemented via a multicycle expectation-conditional-maximization (ECM) algorithm under the threshold model, where we can estimate both the QRL effects and the thresholds that link the disease liability and the categorical phenotype. The method is verified in simulated data under various combinations of the parameters. An SAS program is available to implement the multicycle ECM algorithm. The program can be downloaded from our website at www.statgen.ucr.edu. 相似文献

5.

Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits 总被引：2，自引：0，他引：2

下载免费PDF全文

Yi N Banerjee S Pomp D Yandell BS 《Genetics》2007,176(3):1855-1864

Development of statistical methods and software for mapping interacting QTL has been the focus of much recent research. We previously developed a Bayesian model selection framework, based on the composite model space approach, for mapping multiple epistatic QTL affecting continuous traits. In this study we extend the composite model space approach to complex ordinal traits in experimental crosses. We jointly model main and epistatic effects of QTL and environmental factors on the basis of the ordinal probit model (also called threshold model) that assumes a latent continuous trait underlies the generation of the ordinal phenotypes through a set of unknown thresholds. A data augmentation approach is developed to jointly generate the latent data and the thresholds. The proposed ordinal probit model, combined with the composite model space framework for continuous traits, offers a convenient way for genomewide interacting QTL analysis of ordinal traits. We illustrate the proposed method by detecting new QTL and epistatic effects for an ordinal trait, dead fetuses, in a F(2) intercross of mice. Utility and flexibility of the method are also demonstrated using a simulated data set. Our method has been implemented in the freely available package R/qtlbim, which greatly facilitates the general usage of the Bayesian methodology for genomewide interacting QTL analysis for continuous, binary, and ordinal traits in experimental crosses. 相似文献

6.

Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L. germplasms

Hiroyoshi Iwata Kaworu Ebana Shuichi Fukuoka Jean-Luc Jannink Takeshi Hayashi 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2009,118(5):865-880

Association mapping can be a powerful tool for detecting quantitative trait loci (QTLs) without requiring line-crossing experiments. We previously proposed a Bayesian approach for simultaneously mapping multiple QTLs by a regression method that directly incorporates estimates of the population structure. In the present study, we extended our method to analyze ordinal and censored traits, since both types of traits are common in the evaluation of germplasm collections. Ordinal-probit and tobit models were employed to analyze ordinal and censored traits, respectively. In both models, we postulated the existence of a latent continuous variable associated with the observable data, and we used a Markov-chain Monte Carlo algorithm to sample the latent variable and determine the model parameters. We evaluated the efficiency of our approach by using simulated- and real-trait analyses of a rice germplasm collection. Simulation analyses based on real marker data showed that our models could reduce both false-positive and false-negative rates in detecting QTLs to reasonable levels. Simulation analyses based on highly polymorphic marker data, which were generated by coalescent simulations, showed that our models could be applied to genotype data based on highly polymorphic marker systems, like simple sequence repeats. For the real traits, we analyzed heading date as a censored trait and amylose content and the shape of milled rice grains as ordinal traits. We found significant markers that may be linked to previously reported QTLs. Our approach will be useful for whole-genome association mapping of ordinal and censored traits in rice germplasm collections. 相似文献

7.

Clustering expressed genes on the basis of their association with a quantitative phenotype

Jia Z Xu S 《Genetical research》2005,86(3):193-207

Cluster analyses of gene expression data are usually conducted based on their associations with the phenotype of a particular disease. Many disease traits have a clearly defined binary phenotype (presence or absence), so that genes can be clustered based on the differences of expression levels between the two contrasting phenotypic groups. For example, cluster analysis based on binary phenotype has been successfully used in tumour research. Some complex diseases have phenotypes that vary in a continuous manner and the method developed for a binary trait is not immediately applicable to a continuous trait. However, understanding the role of gene expression in these complex traits is of fundamental importance. Therefore, it is necessary to develop a new statistical method to cluster expressed genes based on their association with a quantitative trait phenotype. We developed a model-based clustering method to classify genes based on their association with a continuous phenotype. We used a linear model to describe the relationship between gene expression and the phenotypic value. The model effects of the linear model (linear regression coefficients) represent the strength of the association. We assumed that the model effects of each gene follow a mixture of several multivariate Gaussian distributions. Parameter estimation and cluster assignment were accomplished via an Expectation-Maximization (EM) algorithm. The method was verified by analysing two simulated datasets, and further demonstrated using real data generated in a microarray experiment for the study of gene expression associated with Alzheimer's disease. 相似文献

8.

An Efficient Hierarchical Generalized Linear Mixed Model for Mapping QTL of Ordinal Traits in Crop Cultivars

Jian-Ying Feng Jin Zhang Wen-Jie Zhang Shi-Bo Wang Shi-Feng Han Yuan-Ming Zhang 《PloS one》2013,8(4)

Many important phenotypic traits in plants are ordinal. However, relatively little is known about the methodologies for ordinal trait association studies. In this study, we proposed a hierarchical generalized linear mixed model for mapping quantitative trait locus (QTL) of ordinal traits in crop cultivars. In this model, all the main-effect QTL and QTL-by-environment interaction were treated as random, while population mean, environmental effect and population structure were fixed. In the estimation of parameters, the pseudo data normal approximation of likelihood function and empirical Bayes approach were adopted. A series of Monte Carlo simulation experiments were performed to confirm the reliability of new method. The result showed that new method works well with satisfactory statistical power and precision. The new method was also adopted to dissect the genetic basis of soybean alkaline-salt tolerance in 257 soybean cultivars obtained, by stratified random sampling, from 6 geographic ecotypes in China. As a result, 6 main-effect QTL and 3 QTL-by-environment interactions were identified. 相似文献

9.

Bayesian Linkage Analysis of Categorical Traits for Arbitrary Pedigree Designs

Abra Brisbin Myrna M. Weissman Abby J. Fyer Steven P. Hamilton James A. Knowles Carlos D. Bustamante Jason G. Mezey 《PloS one》2010,5(8)

Background

Pedigree studies of complex heritable diseases often feature nominal or ordinal phenotypic measurements and missing genetic marker or phenotype data.

Methodology

We have developed a Bayesian method for Linkage analysis of Ordinal and Categorical traits (LOCate) that can analyze complex genealogical structure for family groups and incorporate missing data. LOCate uses a Gibbs sampling approach to assess linkage, incorporating a simulated tempering algorithm for fast mixing. While our treatment is Bayesian, we develop a LOD (log of odds) score estimator for assessing linkage from Gibbs sampling that is highly accurate for simulated data. LOCate is applicable to linkage analysis for ordinal or nominal traits, a versatility which we demonstrate by analyzing simulated data with a nominal trait, on which LOCate outperforms LOT, an existing method which is designed for ordinal traits. We additionally demonstrate our method''s versatility by analyzing a candidate locus (D2S1788) for panic disorder in humans, in a dataset with a large amount of missing data, which LOT was unable to handle.

Conclusion

LOCate''s accuracy and applicability to both ordinal and nominal traits will prove useful to researchers interested in mapping loci for categorical traits. 相似文献

10.

A Fast EM Algorithm for BayesA-Like Prediction of Genomic Breeding Values

Xiaochen Sun Long Qu Dorian J. Garrick Jack C. M. Dekkers Rohan L. Fernando 《PloS one》2012,7(11)

Prediction accuracies of estimated breeding values for economically important traits are expected to benefit from genomic information. Single nucleotide polymorphism (SNP) panels used in genomic prediction are increasing in density, but the Markov Chain Monte Carlo (MCMC) estimation of SNP effects can be quite time consuming or slow to converge when a large number of SNPs are fitted simultaneously in a linear mixed model. Here we present an EM algorithm (termed “fastBayesA”) without MCMC. This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects. In each EM iteration, SNP effects are predicted as a linear combination of best linear unbiased predictions of breeding values from a mixed linear animal model that incorporates a weighted marker-based realized relationship matrix. Method fastBayesA converges after a few iterations to a joint posterior mode of SNP effects under the BayesA model. When applied to simulated quantitative traits with a range of genetic architectures, fastBayesA is shown to predict GEBV as accurately as BayesA but with less computing effort per SNP than BayesA. Method fastBayesA can be used as a computationally efficient substitute for BayesA, especially when an increasing number of markers bring unreasonable computational burden or slow convergence to MCMC approaches. 相似文献

11.

A Multiple-SNP Approach for Genome-Wide Association Study of Milk Production Traits in Chinese Holstein Cattle

Ming Fang Weixuan Fu Dan Jiang Qin Zhang Dongxiao Sun Xiangdong Ding Jianfeng Liu 《PloS one》2014,9(8)

The multiple-SNP analysis has been studied by many researchers, in which the effects of multiple SNPs are simultaneously estimated and tested in a multiple linear regression. The multiple-SNP association analysis usually has higher power and lower false-positive rate for detecting causative SNP(s) than single marker analysis (SMA). Several methods have been proposed to simultaneously estimate and test multiple SNP effects. In this research, a fast method called MEML (Mixed model based Expectation-Maximization Lasso algorithm) was developed for simultaneously estimate of multiple SNP effects. An improved Lasso prior was assigned to SNP effects which were estimated by searching the maximum joint posterior mode. The residual polygenic effect was included in the model to absorb many tiny SNP effects, which is treated as missing data in our EM algorithm. A series of simulation experiments were conducted to validate the proposed method, and the results showed that compared with SMMA, the new method can dramatically decrease the false-positive rate. The new method was also applied to the 50k SNP-panel dataset for genome-wide association study of milk production traits in Chinese Holstein cattle. Totally, 39 significant SNPs and their nearby 25 genes were found. The number of significant SNPs is remarkably fewer than that by SMMA which found 105 significant SNPs. Among 39 significant SNPs, 8 were also found by SMMA and several well-known QTLs or genes were confirmed again; furthermore, we also got some positional candidate gene with potential function of effecting milk production traits. These novel findings in our research should be valuable for further investigation. 相似文献

12.

Multiple-interval mapping for ordinal traits 总被引：3，自引：0，他引：3

下载免费PDF全文

Li J Wang S Zeng ZB 《Genetics》2006,173(3):1649-1663

Many statistical methods have been developed to map multiple quantitative trait loci (QTL) in experimental cross populations. Among these methods, multiple-interval mapping (MIM) can map QTL with epistasis simultaneously. However, the previous implementation of MIM is for continuously distributed traits. In this study we extend MIM to ordinal traits on the basis of a threshold model. The method inherits the properties and advantages of MIM and can fit a model of multiple QTL effects and epistasis on the underlying liability score. We study a number of statistical issues associated with the method, such as the efficiency and stability of maximization and model selection. We also use computer simulation to study the performance of the method and compare it to other alternative approaches. The method has been implemented in QTL Cartographer to facilitate its general usage for QTL mapping data analysis on binary and ordinal traits. 相似文献

13.

Estimated Breeding Values for Canine Hip Dysplasia Radiographic Traits in a Cohort of Australian German Shepherd Dogs

Bethany J. Wilson Frank W. Nicholas John W. James Claire M. Wade Peter C. Thomson 《PloS one》2013,8(10)

Canine hip dysplasia (CHD) is a serious and common musculoskeletal disease of pedigree dogs and therefore represents both an important welfare concern and an imperative breeding priority. The typical heritability estimates for radiographic CHD traits suggest that the accuracy of breeding dog selection could be substantially improved by the use of estimated breeding values (EBVs) in place of selection based on phenotypes of individuals. The British Veterinary Association/Kennel Club scoring method is a complex measure composed of nine bilateral ordinal traits, intended to evaluate both early and late dysplastic changes. However, the ordinal nature of the traits may represent a technical challenge for calculation of EBVs using linear methods. The purpose of the current study was to calculate EBVs of British Veterinary Association/Kennel Club traits in the Australian population of German Shepherd Dogs, using linear (both as individual traits and a summed phenotype), binary and ordinal methods to determine the optimal method for EBV calculation. Ordinal EBVs correlated well with linear EBVs (r = 0.90–0.99) and somewhat well with EBVs for the sum of the individual traits (r = 0.58–0.92). Correlation of ordinal and binary EBVs varied widely (r = 0.24–0.99) depending on the trait and cut-point considered. The ordinal EBVs have increased accuracy (0.48–0.69) of selection compared with accuracies from individual phenotype-based selection (0.40–0.52). Despite the high correlations between linear and ordinal EBVs, the underlying relationship between EBVs calculated by the two methods was not always linear, leading us to suggest that ordinal models should be used wherever possible. As the population of German Shepherd Dogs which was studied was purportedly under selection for the traits studied, we examined the EBVs for evidence of a genetic trend in these traits and found substantial genetic improvement over time. This study suggests the use of ordinal EBVs could increase the rate of genetic improvement in this population. 相似文献

14.

Statistical optimization of parametric accelerated failure time model for mapping survival trait loci

Piao Z Zhou X Yan L Guo Y Yang R Luo Z Prows DR 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2011,122(5):855-863

Most existing statistical methods for mapping quantitative trait loci (QTL) are not suitable for analyzing survival traits with a skewed distribution and censoring mechanism. As a result, researchers incorporate parametric and semi-parametric models of survival analysis into the framework of the interval mapping for QTL controlling survival traits. In survival analysis, accelerated failure time (AFT) model is considered as a de facto standard and fundamental model for data analysis. Based on AFT model, we propose a parametric approach for mapping survival traits using the EM algorithm to obtain the maximum likelihood estimates of the parameters. Also, with Bayesian information criterion (BIC) as a model selection criterion, an optimal mapping model is constructed by choosing specific error distributions with maximum likelihood and parsimonious parameters. Two real datasets were analyzed by our proposed method for illustration. The results show that among the five commonly used survival distributions, Weibull distribution is the optimal survival function for mapping of heading time in rice, while Log-logistic distribution is the optimal one for hyperoxic acute lung injury. 相似文献

15.

LOT: a tool for linkage analysis of ordinal traits for pedigree data

Zhang M Feng R Chen X Hu B Zhang H 《Bioinformatics (Oxford, England)》2008,24(15):1737-1739

SUMMARY: Existing linkage-analysis methods address binary or quantitative traits. However, many complex diseases and human conditions, particularly behavioral disorders, are rated on ordinal scales. Herein, we introduce, LOT, a tool that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait. The likelihood-ratio test is used for testing evidence of linkage. AVAILABILITY: The LOT program is available for download at http://c2s2.yale.edu/software/LOT/ 相似文献

16.

A score test for linkage analysis of ordinal traits based on IBD sharing

Feng R Zhang H 《Biostatistics (Oxford, England)》2008,9(1):114-127

Statistical methods for linkage analysis are well established for both binary and quantitative traits. However, numerous diseases including cancer and psychiatric disorders are rated on discrete ordinal scales. To analyze pedigree data with ordinal traits, we recently proposed a latent variable model which has higher power to detect linkage using ordinal traits than methods using the dichotomized traits. The challenge with the latent variable model is that the likelihood is usually very complicated, and as a result, the computation of the likelihood ratio statistic is too intensive for large pedigrees. In this paper, we derive a computationally efficient score statistic based on the identity-by-decent sharing information between relatives. Using simulation studies, we examined the asymptotic distribution of the test statistic and the power of our proposed test under various levels of heritability. We compared the computing time as well as power of the score test with the likelihood ratio test. We then applied our method for the Collaborative Study on the Genetics of Alcoholism and performed a genome scan to map susceptibility genes for alcohol dependence. We found a strong linkage signal on chromosome 4. 相似文献

17.

Application of dynamic metabolic flux analysis for process modeling: Robust flux estimation with regularization,confidence bounds,and selection of elementary modes

Lukas Hebing Tobias Neymann Sebastian Engell 《Biotechnology and bioengineering》2020,117(7):2058-2073

In macroscopic dynamic models of fermentation processes, elementary modes (EM) derived from metabolic networks are often used to describe the reaction stoichiometry in a simplified manner and to build predictive models by parameterizing kinetic rate equations for the EM. In this procedure, the selection of a set of EM is a key step which is followed by an estimation of their reaction rates and of the associated confidence bounds. In this paper, we present a method for the computation of reaction rates of cellular reactions and EM as well as an algorithm for the selection of EM for process modeling. The method is based on the dynamic metabolic flux analysis (DMFA) proposed by Leighty and Antoniewicz (2011, Metab Eng, 13(6), 745–755) with additional constraints, regularization and analysis of uncertainty. Instead of using estimated uptake or secretion rates, concentration measurements are used directly to avoid an amplification of measurement errors by numerical differentiation. It is shown that the regularized DMFA for EM method is significantly more robust against measurement noise than methods using estimated rates. The confidence intervals for the estimated reaction rates are obtained by bootstrapping. For the selection of a set of EM for a given st oichiometric model, the DMFA for EM method is combined with a multiobjective genetic algorithm. The method is applied to real data from a CHO fed-batch process. From measurements of six fed-batch experiments, 10 EM were identified as the smallest subset of EM based upon which the data can be described sufficiently accurately by a dynamic model. The estimated EM reaction rates and their confidence intervals at different process conditions provide useful information for the kinetic modeling and subsequent process optimization. 相似文献

18.

Genetic analysis of growth curves using the SAEM algorithm

Florence Jaffrézic Cristian Meza Marc Lavielle Jean-Louis Foulley 《遗传、选种与进化》2006,38(6):583-600

The analysis of nonlinear function-valued characters is very important in genetic studies, especially for growth traits of agricultural and laboratory species. Inference in nonlinear mixed effects models is, however, quite complex and is usually based on likelihood approximations or Bayesian methods. The aim of this paper was to present an efficient stochastic EM procedure, namely the SAEM algorithm, which is much faster to converge than the classical Monte Carlo EM algorithm and Bayesian estimation procedures, does not require specification of prior distributions and is quite robust to the choice of starting values. The key idea is to recycle the simulated values from one iteration to the next in the EM algorithm, which considerably accelerates the convergence. A simulation study is presented which confirms the advantages of this estimation procedure in the case of a genetic analysis. The SAEM algorithm was applied to real data sets on growth measurements in beef cattle and in chickens. The proposed estimation procedure, as the classical Monte Carlo EM algorithm, provides significance tests on the parameters and likelihood based model comparison criteria to compare the nonlinear models with other longitudinal methods. 相似文献

19.

Mixed-model analysis of a censored normal distribution with reference to animal breeding 总被引：1，自引：0，他引：1

A L Carriquiry D Gianola R L Fernando 《Biometrics》1987,43(4):929-939

A mixed-model procedure for analysis of censored data assuming a multivariate normal distribution is described. A Bayesian framework is adopted which allows for estimation of fixed effects and variance components and prediction of random effects when records are left-censored. The procedure can be extended to right- and two-tailed censoring. The model employed is a generalized linear model, and the estimation equations resemble those arising in analysis of multivariate normal or categorical data with threshold models. Estimates of variance components are obtained using expressions similar to those employed in the EM algorithm for restricted maximum likelihood (REML) estimation under normality. 相似文献

20.

A fast algorithm for functional mapping of complex traits

Zhao W Wu R Ma CX Casella G 《Genetics》2004,167(4):2133-2137

By integrating the underlying developmental mechanisms for the phenotypic formation of traits into a mapping framework, functional mapping has emerged as an important statistical approach for mapping complex traits. In this note, we explore the feasibility of using the simplex algorithm as an alternative to solve the mixture-based likelihood for functional mapping of complex traits. The results from the simplex algorithm are consistent with those from the traditional EM algorithm, but the simplex algorithm has considerably reduced computational times. Moreover, because of its nonderivative nature and easy implementation with current software, the simplex algorithm enjoys an advantage over the EM algorithm in the dynamic modeling and analysis of complex traits. 相似文献