期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A fast and reliable computational method for estimating population genetic parameters

Vasco DA 《Genetics》2008,179(2):951-963

The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination. 相似文献

2.

Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices

Cassio P. de Campos Paola M. V. Rancoita Ivo Kwee Emanuele Zucca Marco Zaffalon Francesco Bertoni 《PloS one》2013,8(11)

In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases. 相似文献

3.

SuperFine: fast and accurate supertree estimation

Swenson MS Suri R Linder CR Warnow T 《Systematic biology》2012,61(2):214-227

Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions. 相似文献

4.

Impact of prior specifications in ashrinkage-inducing Bayesian model for quantitative trait mapping and genomic prediction

Timo Knürr Esa L??r? Mikko J Sillanp?? 《遗传、选种与进化》2013,45(1):24

Background

In quantitative trait mapping and genomic prediction, Bayesian variable selection methods have gained popularity in conjunction with the increase in marker data and computational resources. Whereas shrinkage-inducing methods are common tools in genomic prediction, rigorous decision making in mapping studies using such models is not well established and the robustness of posterior results is subject to misspecified assumptions because of weak biological prior evidence.

Methods

Here, we evaluate the impact of prior specifications in a shrinkage-based Bayesian variable selection method which is based on a mixture of uniform priors applied to genetic marker effects that we presented in a previous study. Unlike most other shrinkage approaches, the use of a mixture of uniform priors provides a coherent framework for inference based on Bayes factors. To evaluate the robustness of genetic association under varying prior specifications, Bayes factors are compared as signals of positive marker association, whereas genomic estimated breeding values are considered for genomic selection. The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications. A Gibbs sampler is used to perform Markov chain Monte Carlo estimation (MCMC) and a generalized expectation-maximization algorithm as a faster alternative for maximum a posteriori point estimation. The performance of the method is evaluated by using two publicly available data examples: the simulated QTLMAS XII data set and a real data set from a population of pigs.

Results

Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably. Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC.

Conclusions

Since no error-free method to specify priors is available for complex biological phenomena, exploring a wide variety of prior specifications and combining results provides some solution to this problem. For this purpose, the mixture of uniform priors approach is especially suitable, because it comprises a wide and flexible family of distributions and computationally intensive estimation can be carried out in a reasonable amount of time. 相似文献

5.

A new approach of fitting biomass dynamics models to data 总被引：2，自引：0，他引：2

Ussif AA Sandal LK Steinshamn SI 《Mathematical biosciences》2003,182(1):67-79

A non-traditional approach of fitting dynamic resource biomass models to data is developed in this paper. A variational adjoint technique is used for dynamic parameter estimation. In the variational formulation, a cost function measuring the distance between the model solution and the observations is minimized. The data assimilation method provides a novel and computationally efficient procedure for combining all available information, i.e., the data and the model in the analysis of a resource system. This technique will be used to analyze data for the North-east Arctic cod stock. Two alternative population growth models: the logistic and the Gompertz model are used for estimating parameters of simple bioeconomic models by the method of constrained least squares. Estimates of the parameters of the models dynamics are reasonable and can be accepted. The main inference from the work is that the average fishing mortality is found to be significantly above the maximum sustainable yield value. 相似文献

6.

Effects of sampling regime on the mean and variance of home range size estimates 总被引：4，自引：1，他引：3

Börger L Franconi N De Michele G Gantz A Meschi F Manica A Lovari S Coulson T 《The Journal of animal ecology》2006,75(6):1393-1405

1. Although the home range is a fundamental ecological concept, there is considerable debate over how it is best measured. There is a substantial literature concerning the precision and accuracy of all commonly used home range estimation methods; however, there has been considerably less work concerning how estimates vary with sampling regime, and how this affects statistical inferences. 2. We propose a new procedure, based on a variance components analysis using generalized mixed effects models to examine how estimates vary with sampling regime. 3. To demonstrate the method we analyse data from one study of 32 individually marked roe deer and another study of 21 individually marked kestrels. We subsampled these data to simulate increasingly less intense sampling regimes, and compared the performance of two kernel density estimation (KDE) methods, of the minimum convex polygon (MCP) and of the bivariate ellipse methods. 4. Variation between individuals and study areas contributed most to the total variance in home range size. Contrary to recent concerns over reliability, both KDE methods were remarkably efficient, robust and unbiased: 10 fixes per month, if collected over a standardized number of days, were sufficient for accurate estimates of home range size. However, the commonly used 95% isopleth should be avoided; we recommend using isopleths between 90 and 50%. 5. Using the same number of fixes does not guarantee unbiased home range estimates: statistical inferences differ with the number of days sampled, even if using KDE methods. 6. The MCP method was highly inefficient and results were subject to considerable and unpredictable biases. The bivariate ellipse was not the most reliable method at low sample sizes. 7. We conclude that effort should be directed at marking more individuals monitored over long periods at the expense of the sampling rate per individual. Statistical results are reliable only if the whole sampling regime is standardized. We derive practical guidelines for field studies and data analysis. 相似文献

7.

A rapid capillary gel electrophoresis method for the quantitative determination of RuBisCo in spinach

Nicholas KU Forney CF Paulson AT 《Phytochemical analysis : PCA》2002,13(1):39-44

A capillary gel electrophoretic (CGE) method for the quantitative analysis of RuBisCo in spinach leaves was developed. RuBisCo was resolved into large and small subunits in the presence of sodium dodecyl sulphate (SDS) by the CGE procedure which enabled the determination of the molecular weight of each unit accurately; the values so determined were in close agreement with those reported using other methods. Advantages of CGE over SDS-polyacrylamide gel electrophoresis and high-pressure gel filtration include decreased sample preparation and analysis time, superior resolution and greater sensitivity permitting reduced sample size and trace analysis. In addition, CGE provided precise quantification of RuBisCo and was demonstrated to be a viable alternative to other available methods of protein analysis. 相似文献

8.

Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation

Jin P. Szatkiewicz WeiBo Wang Patrick F. Sullivan Wei Wang Wei Sun 《Nucleic acids research》2013,41(3):1519-1532

Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available. 相似文献

9.

Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

Andridge RR 《Biometrical journal. Biometrische Zeitschrift》2011,53(1):57-74

In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller intraclass correlations (ICCs) lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random, and cases in which data are missing at random are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. 相似文献

10.

Analysis of significant protein abundance from multiple reaction-monitoring data

Jongsu Jun Jungsoo Gim Yongkang Kim Hyunsoo Kim Su Jong Yu Injun Yeo Jiyoung Park Jeong-Ju Yoo Young Youn Cho Dong Hyeon Lee Eun Ju Cho Jeong-Hoon Lee Yoon Jun Kim Seungyeoun Lee Jung-Hwan Yoon Youngsoo Kim Taesung Park 《BMC systems biology》2018,12(9):123

Background

Discovering reliable protein biomarkers is one of the most important issues in biomedical research. The ELISA is a traditional technique for accurate quantitation of well-known proteins. Recently, the multiple reaction-monitoring (MRM) mass spectrometry has been proposed for quantifying newly discovered protein and has become a popular alternative to ELISA. For the MRM data analysis, linear mixed modeling (LMM) has been used to analyze MRM data. MSstats is one of the most widely used tools for MRM data analysis that is based on the LMMs. However, LMMs often provide various significance results, depending on model specification. Sometimes it would be difficult to specify a correct LMM method for the analysis of MRM data. Here, we propose a new logistic regression-based method for Significance Analysis of Multiple Reaction Monitoring (LR-SAM).

Results

Through simulation studies, we demonstrate that LMM methods may not preserve type I error, thus yielding high false- positive errors, depending on how random effects are specified. Our simulation study also shows that the LR-SAM approach performs similarly well as LMM approaches, in most cases. However, LR-SAM performs better than the LMMs, particularly when the effects sizes of peptides from the same protein are heterogeneous. Our proposed method was applied to MRM data for identification of proteins associated with clinical responses of treatment of 115 hepatocellular carcinoma (HCC) patients with the tyrosine kinase inhibitor sorafenib. Of 124 candidate proteins, LMM approaches provided 6 results varying in significance, while LR-SAM, by contrast, yielded 18 significant results that were quite reproducibly consistent.

Conclusion

As exemplified by an application to HCC data set, LR-SAM more effectively identified proteins associated with clinical responses of treatment than LMM did.

相似文献

11.

Work efficiency: a new criterion for comprehensive comparison and evaluation of statistical methods in large-scale identification of differentially expressed genes

Tan YD 《Genomics》2011,98(5):390-399

Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different. 相似文献

12.

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Long Liu Qingyu Meng Cherry Weng Qing Lu Tong Wang Yalu Wen 《PLoS computational biology》2022,18(7)

Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods. 相似文献

13.

Comparison of different approaches for comparative genetic analysis using microarray hybridization

Pin C Reuter M Pearson B Friis L Overweg K Baranyi J Wells J 《Applied microbiology and biotechnology》2006,72(4):852-859

A robust analysis of comparative genomic microarray data is critical for meaningful genomic comparison studies. In this paper, we compare our method (implemented in a new software tool, GENCOM, freely available at ) with three commonly used analysis methods: GACK (freely available at ), an empirical cut-off value of twofold difference between the fluorescence intensities after LOWESS normalization or after AVERAGE normalization in which the fluorescence intensity is divided by the average fluorescence intensity of the entire data set. Each method was tested using data sets from real experiments with prior knowledge of conserved and divergent genes. GENCOM and GACK were superior when a high proportion of genes were divergent. GENCOM was the most suitable method for the data set in which the relationship between the fluorescence intensities was not linear. GENCOM has proved robust in an analysis of all the data sets tested. 相似文献

14.

Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. 总被引：6，自引：0，他引：6

Tobias Müller Rainer Spang Martin Vingron 《Molecular biology and evolution》2002,19(1):8-13

Evolution of proteins is generally modeled as a Markov process acting on each site of the sequence. Replacement frequencies need to be estimated based on sequence alignments. Here we compare three approaches: First, the original method by Dayhoff, Schwartz, and Orcutt (1978) Atlas Protein Seq. Struc. 5:345-352, secondly, the resolvent method (RV) by Müller and Vingron (2000) J. Comput. Biol. 7(6):761-776, and finally a maximum likelihood approach (ML) developed in this paper. We evaluate the methods using a highly divergent and inhomogeneous set of sequence alignments as an input to the estimation procedure. ML is the method of choice for small sets of input data. Although the RV method is computationally much less demanding it performs only slightly worse than ML. Therefore, it is perfectly appropriate for large-scale applications. 相似文献

15.

Tests for Order Restrictions in Binary Data

Shyamal D. Peddada Katherine E. Prescott Mark Conaway 《Biometrics》2001,57(4):1219-1227

In this article, a general procedure is presented for testing for equality of k independent binary response probabilities against any given ordered alternative. The proposed methodology is based on an estimation procedure developed in Hwang and Peddada (1994, Annals of Statistics 22, 67-93) and can be used for a very broad class of order restrictions. The procedure is illustrated through application to two data sets that correspond to three commonly encountered order restrictions: simple tree order, simple order, and down turn order. 相似文献

16.

用R法估计方差组分的原理、方法和应用

刘文忠《遗传》2004,26(4):532-536

综述了R法估计方差组分的原理、方法和应用,目的是使该方法能够得到合理应用。R法是通过计算全数据集对亚数据集随机效应的回归因子（R）来估计方差组分的。利用一种基于一个变换矩阵的多变量迭代算法,结合先决条件的共扼梯度法求解混合模型方程组使R法的计算效率大为改善。R法的主要优点是计算成本低,同时可以得到方差组分估值的抽样误差和近似置信区间。其缺点是对于同样的数据,R法较其他方法的抽样误差大,而且在小样本中估计值往往有偏。做为一种可选方法,R法可以应用到大数据集的方差组分估计中,同时应进一步研究其理论特性,拓宽其应用范围。Abstract: Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further. 相似文献

17.

High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

Daye ZJ Chen J Li H 《Biometrics》2012,68(1):316-326

We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. 相似文献

18.

Forest canopy gap detection and characterisation by the use of highresolution Digital Elevation Models

下载免费PDF全文

《新西兰生态学杂志》2011,29(1):95-103

The remote identification of forest canopy gaps from Digital Elevation Models (DEMs) built from aerial photographs is potentially a viable alternative to ground-based field surveys. In this study a DEM-based gap-finding algorithm, given suitable experimentally determined input parameters, yielded canopy gap statistics for a study area that were consistent with ground-based survey data from the same area. The method could thus be ‘trained’ to replicate ground-based results for a small test area of beech (Nothofagus) forest, with the potential for it to be applied to larger areas of forest of a similar type to gather canopy gap data with relatively little additional field work. The use of a DEM-based method also has the advantage that the results are easily analysed and mapped using commonly available GIS and cartographic software. 相似文献

19.

Mixed Model Methods for Genomic Prediction and Variance Component Estimation of Additive and Dominance Effects Using SNP Markers

Yang Da Chunkao Wang Shengwen Wang Guo Hu 《PloS one》2014,9(1)

We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005–0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level. 相似文献

20.

A fast genomic selection approach for large genomic data

Hailan Liu Guo-Bo Chen 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2017,130(6):1277-1284

Key message

We propose a novel computational method for genomic selection that combines identical-by-state (IBS)-based Haseman–Elston (HE) regression and best linear prediction (BLP), called HE-BLP.

Abstract

Genomic best linear unbiased prediction (GBLUP) has been widely used in whole-genome prediction for breeding programs. To determine the total genetic variance of a training population, a linear mixed model (LMM) should be solved via restricted maximum likelihood (REML), whose computational complexity is the cube of the sample size. We proposed a novel computational method combining identical-by-state (IBS)-based Haseman–Elston (HE) regression and best linear prediction (BLP), called HE-BLP. With this method, the total genetic variance can be estimated by solving a simple HE linear regression, which has a computational complex of the sample size squared; therefore, it is suitable for large-scale genomic data, except those with which environmental effects need to be estimated simultaneously, because it does not allow for this estimation. In Monte Carlo simulation studies, the estimated heritability based on HE was identical to that based on REML, and the prediction accuracy via HE-BLP and traditional GBLUP was also quite similar when quantitative trait loci (QTLs) were randomly distributed along the genome and their effects followed a normal distribution. In addition, the kernel row number (KRN) trait in a maize IBM population was used to evaluate the performance of the two methods; the results showed similar prediction accuracy of breeding values despite slightly different estimated heritability via HE and REML, probably due to the underlying genetic architecture. HE-BLP can be a future genomic selection method choice for even larger sets of genomic data in certain special cases where environmental effects can be ignored. The software for HE regression and the simulation program is available online in the Genetic Analysis Repository (GEAR; https://github.com/gc5k/GEAR/wiki).

相似文献