期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Local multiple imputation

Aerts Marc; Claeskens Gerda; Hens Niel; Molenberghs Geert 《Biometrika》2002,89(2):375-388

相似文献

2.

MagicalRsq: Machine-learning-based genotype imputation quality calibration

《American journal of human genetics》2022,109(11):1986-1997

Download : Download high-res image (200KB)
Download : Download full-size image

相似文献

3.

Genome‐wide association studies based on sequence‐derived genotypes reveal new QTL associated with conformation and performance traits in the Franches–Montagnes horse breed

下载免费PDF全文

M. Frischknecht H. Signer‐Hasler T. Leeb S. Rieder M. Neuditschko 《Animal genetics》2016,47(2):227-229

To identify novel quantitative trait loci (QTL) within horses, we performed genome‐wide association studies (GWAS) based on sequence‐level genotypes for conformation and performance traits in the Franches–Montagnes (FM) horse breed. Sequence‐level genotypes of FM horses were derived by re‐sequencing 30 key founders and imputing 50K data of genotyped horses. In total, we included 1077 FM horses genotyped for ~4 million SNPs and their respective de‐regressed breeding values of the traits in the analysis. Based on this dataset, we identified a total of 14 QTL associated with 18 conformation traits and one performance trait. Therefore, our results suggest that the application of sequence‐derived genotypes increases the power to identify novel QTL which were not identified previously based on 50K SNP chip data. 相似文献

4.

Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable

Anneke C. Grobler Katherine Lee 《Biometrical journal. Biometrische Zeitschrift》2020,62(2):467-478

Multiple imputation (MI) is used to handle missing at random (MAR) data. Despite warnings from statisticians, continuous variables are often recoded into binary variables. With MI it is important that the imputation and analysis models are compatible; variables should be imputed in the same form they appear in the analysis model. With an encoded binary variable more accurate imputations may be obtained by imputing the underlying continuous variable. We conducted a simulation study to explore how best to impute a binary variable that was created from an underlying continuous variable. We generated a completely observed continuous outcome associated with an incomplete binary covariate that is a categorized version of an underlying continuous covariate, and an auxiliary variable associated with the underlying continuous covariate. We simulated data with several sample sizes, and set 25% and 50% of data in the covariate to MAR dependent on the outcome and the auxiliary variable. We compared the performance of five different imputation methods: (a) Imputation of the binary variable using logistic regression; (b) imputation of the continuous variable using linear regression, then categorizing into the binary variable; (c, d) imputation of both the continuous and binary variables using fully conditional specification (FCS) and multivariate normal imputation; (e) substantive-model compatible (SMC) FCS. Bias and standard errors were large when the continuous variable only was imputed. The other methods performed adequately. Imputation of both the binary and continuous variables using FCS often encountered mathematical difficulties. We recommend the SMC-FCS method as it performed best in our simulation studies. 相似文献

5.

Variance estimation under two-phase sampling with application to imputation for missing data 总被引：3，自引：0，他引：3

RAO J. N. K.; SITTER R. R. 《Biometrika》1995,82(2):453-460

相似文献

6.

Comparison of different imputation methods from low- to high-density panels using Chinese Holstein cattle

《Animal : an international journal of animal bioscience》2013,7(5):729-735

Imputation of high-density genotypes from low- or medium-density platforms is a promising way to enhance the efficiency of whole-genome selection programs at low cost. In this study, we compared the efficiency of three widely used imputation algorithms (fastPHASE, BEAGLE and findhap) using Chinese Holstein cattle with Illumina BovineSNP50 genotypes. A total of 2108 cattle were randomly divided into a reference population and a test population to evaluate the influence of the reference population size. Three bovine chromosomes, BTA1, 16 and 28, were used to represent large, medium and small chromosome size, respectively. We simulated different scenarios by randomly masking 20%, 40%, 80% and 95% single-nucleotide polymorphisms (SNPs) on each chromosome in the test population to mimic different SNP density panels. Illumina Bovine3K and Illumina BovineLD (6909 SNPs) information was also used. We found that the three methods showed comparable accuracy when the proportion of masked SNPs was low. However, the difference became larger when more SNPs were masked. BEAGLE performed the best and was most robust with imputation accuracies >90% in almost all situations. fastPHASE was affected by the proportion of masked SNPs, especially when the masked SNP rate was high. findhap ran the fastest, whereas its accuracies were lower than those of BEAGLE but higher than those of fastPHASE. In addition, enlarging the reference population improved the imputation accuracy for BEAGLE and findhap, but did not affect fastPHASE. Considering imputation accuracy and computational requirements, BEAGLE has been found to be more reliable for imputing genotypes from low- to high-density genotyping platforms. 相似文献

7.

Meta-imputation: An efficient method to combine genotype data after imputation with multiple reference panels

《American journal of human genetics》2022,109(6):1007-1015

相似文献

8.

Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (Triticum aestivum L.)

Sang He Yusheng Zhao M Florian Mette Reiner Bothe Erhard Ebmeyer Timothy F Sharbel Jochen C Reif Yong Jiang 《BMC genomics》2015,16(1)

Background

The main goal of our study was to investigate the implementation, prospects, and limits of marker imputation for quantitative genetic studies contrasting map-independent and map-dependent algorithms. We used a diversity panel consisting of 372 European elite wheat (Triticum aestivum L.) varieties, which had been genotyped with SNP arrays, and performed intensive simulation studies.

Results

Our results clearly showed that imputation accuracy was substantially higher for map-dependent compared to map-independent methods. The accuracy of marker imputation depended strongly on the linkage disequilibrium between the markers in the reference panel and the markers to be imputed. For the decay of linkage disequilibrium present in European wheat, we concluded that around 45,000 markers are needed for low cost, low-density marker profiling. This will facilitate high imputation accuracy, also for rare alleles. Genomic selection and diversity studies profited only marginally from imputing missing values. In contrast, the power of association mapping increased substantially when missing values were imputed.

Conclusions

Imputing missing values is especially of interest for an economic implementation of association mapping in breeding populations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1366-y) contains supplementary material, which is available to authorized users. 相似文献

9.

The effect of high-density genotypic data and different methods on joint genomic prediction: A case study in large white pigs

Wei Zhao Zhenyang Zhang Peipei Ma Zhen Wang Qishan Wang Zhe Zhang Yuchun Pan 《Animal genetics》2023,54(1):45-54

Joint genomic prediction (GP) is an attractive method to improve the accuracy of GP by combining information from multiple populations. However, many factors can negatively influence the accuracy of joint GP, such as differences in linkage disequilibrium phasing between single nucleotide polymorphisms (SNPs) and causal variants, minor allele frequencies and causal variants’ effect sizes across different populations. The objective of this study was to investigate whether the imputed high-density genotype data can improve the accuracy of joint GP using genomic best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP), multi-trait GBLUP (MT-GBLUP) and GBLUP based on genomic relationship matrix considering heterogenous minor allele frequencies across different populations (wGBLUP). Three traits, including days taken to reach slaughter weight, backfat thickness and loin muscle area, were measured on 67 276 Large White pigs from two different populations, for which 3334 were genotyped by SNP array. The results showed that a combined population could substantially improve the accuracy of GP compared with a single-population GP, especially for the population with a smaller size. The imputed SNP data had no effect for single population GP but helped to yield higher accuracy than the medium-density array data for joint GP. Of the four methods, ssGLBUP performed the best, but the advantage of ssGBLUP decreased as more individuals were genotyped. In some cases, MT-GBLUP and wGBLUP performed better than GBLUP. In conclusion, our results confirmed that joint GP could be beneficial from imputed high-density genotype data, and the wGBLUP and MT-GBLUP methods are promising for joint GP in pig breeding. 相似文献

10.

Diversity and linkage disequilibrium in farmed Tasmanian Atlantic salmon 总被引：1，自引：0，他引：1

下载免费PDF全文

J. Kijas N. Elliot P. Kube B. Evans N. Botwright H. King C. R. Primmer K. Verbyla 《Animal genetics》2017,48(2):237-241

Farmed Atlantic salmon (Salmo salar) is a globally important production species, including in Australia where breeding and selection has been in progress since the 1960s. The recent development of SNP genotyping platforms means genome‐wide association and genomic prediction can now be implemented to speed genetic gain. As a precursor, this study collected genotypes at 218 132 SNPs in 777 fish from a Tasmanian breeding population to assess levels of genetic diversity, the strength of linkage disequilibrium (LD) and imputation accuracy. Genetic diversity in Tasmanian Atlantic salmon was lower than observed within European populations when compared using four diversity metrics. The distribution of allele frequencies also showed a clear difference, with the Tasmanian animals carrying an excess of low minor allele frequency variants. The strength of observed LD was high at short distances (<25 kb) and remained above background for marker pairs separated by large chromosomal distances (hundreds of kb), in sharp contrast to the European Atlantic salmon tested. Genotypes were used to evaluate the accuracy of imputation from low density (0.5 to 5 K) up to increased density SNP sets (78 K). This revealed high imputation accuracies (0.89–0.97), suggesting that the use of low density SNP sets will be a successful approach for genomic prediction in this population. The long‐range LD, comparatively low genetic diversity and high imputation accuracy in Tasmanian salmon is consistent with known aspects of their population history, which involved a small founding population and an absence of subsequent introgression. The findings of this study represent an important first step towards the design of methods to apply genomics in this economically important population. 相似文献

11.

Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications

《Animal : an international journal of animal bioscience》2014,8(11):1743-1753

In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions. 相似文献

12.

Evaluation of missing data imputation methods for human osteometric measurements

Jinyong Pang Xiaoming Liu 《American journal of physical anthropology》2023,181(4):666-676

It is not uncommon for biological anthropologists to analyze incomplete bioarcheological or forensic skeleton specimens. As many quantitative multivariate analyses cannot handle incomplete data, missing data imputation or estimation is a common preprocessing practice for such data. Using William W. Howells' Craniometric Data Set and the Goldman Osteometric Data Set, we evaluated the performance of multiple popular statistical methods for imputing missing metric measurements. Results indicated that multiple imputation methods outperformed single imputation methods, such as Bayesian principal component analysis (BPCA). Multiple imputation with Bayesian linear regression implemented in the R package norm2, the Expectation–Maximization (EM) with Bootstrapping algorithm implemented in Amelia, and the Predictive Mean Matching (PMM) method and several of the derivative linear regression models implemented in mice, perform well regarding accuracy, robustness, and speed. Based on the findings of this study, we suggest a practical procedure for choosing appropriate imputation methods. 相似文献

13.

Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection

《Animal : an international journal of animal bioscience》2016,10(7):1077-1085

Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population. 相似文献

14.

Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data

Reiter Jerome P. 《Biometrika》2007,94(2):502-508

When performing multi-component significance tests with multiply-imputeddatasets, analysts can use a Wald-like test statistic and areference F-distribution. The currently employed degrees offreedom in the denominator of this F-distribution are derivedassuming an infinite sample size. For modest complete-data samplesizes, this degrees of freedom can be unrealistic; for example,it may exceed the complete-data degrees of freedom. This paperpresents an alternative denominator degrees of freedom thatis always less than or equal to the complete-data denominatordegrees of freedom, and equals the currently employed denominatordegrees of freedom for infinite sample sizes. Its advantagesover the currently employed degrees of freedom are illustratedwith a simulation. 相似文献

15.

Use of partial least squares regression to predict single nucleotide polymorphism marker genotypes when some animals are genotyped with a low-density panel

Dimauro C Steri R Pintus MA Gaspa G Macciotta NP 《Animal : an international journal of animal bioscience》2011,5(6):833-837

High-density single nucleotide polymorphism (SNP) platforms are currently used in genomic selection (GS) programs to enhance the selection response. However, the genotyping of a large number of animals with high-throughput platforms is rather expensive and may represent a constraint for a large-scale implementation of GS. The use of low-density marker (LDM) platforms could overcome this problem, but different SNP chips may be required for each trait and/or breed. In this study, a strategy of imputation independent from trait and breed is proposed. A simulated population of 5865 individuals with a genome of 6000 SNP equally distributed on six chromosomes was considered. First, reference and prediction populations were generated by mimicking high- and low-density SNP platforms, respectively. Then, the partial least squares regression (PLSR) technique was applied to reconstruct the missing SNP in the low-density chip. The proportion of SNP correctly reconstructed by the PLSR method ranged from 0.78 to 0.97 when 90% and 50%, respectively, of genotypes were predicted. Moreover, data sets consisting of a mixture of actual and PLSR-predicted SNP or only actual SNP were used to predict genomic breeding values (GEBVs). Correlations between GEBV and true breeding values varied from 0.74 to 0.76, respectively. The results of the study indicate that the PLSR technique can be considered a reliable computational strategy for predicting SNP genotypes in an LDM platform with reasonable accuracy. 相似文献

16.

Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing

《American journal of human genetics》2022,109(9):1653-1666

相似文献

17.

Strategies for analysing missing item response data with an application to lung cancer

Sheng X Carrière KC 《Biometrical journal. Biometrische Zeitschrift》2005,47(5):605-615

Missing data problems persist in many scientific investigations. Although various strategies for analyzing missing data have been proposed, they are mainly limited to data on continuous measurements. In this paper, we focus on implementing some of the available strategies to analyze item response data. In particular, we investigate the effects of popular missing data methods on various missing data mechanisms. We examine large sample behaviors of estimators in a simulation study that evaluates and compares their performance. We use data from a quality of life study with lung cancer patients to illustrate the utility of these methods. 相似文献

18.

Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

Andridge RR 《Biometrical journal. Biometrische Zeitschrift》2011,53(1):57-74

In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller intraclass correlations (ICCs) lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random, and cases in which data are missing at random are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. 相似文献

19.

Genetics-informed precision treatment formulation in schizophrenia and bipolar disorder

《American journal of human genetics》2022,109(9):1620-1637

相似文献

20.

Model selection of generalized estimating equations with multiply imputed longitudinal data

Chung‐Wei Shen Yi‐Hau Chen 《Biometrical journal. Biometrische Zeitschrift》2013,55(6):899-911

Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI‐GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the “quasi‐likelihood under the independence model criterion” (QIC) and the “missing longitudinal information criterion” (MLIC), to accommodate multiple imputed datasets for selection of the MI‐GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI‐GEE analysis; (ii) the MI‐based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation. 相似文献