首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Although genomic selection offers the prospect of improving the rate of genetic gain in meat, wool and dairy sheep breeding programs, the key constraint is likely to be the cost of genotyping. Potentially, this constraint can be overcome by genotyping selection candidates for a low density (low cost) panel of SNPs with sparse genotype coverage, imputing a much higher density of SNP genotypes using a densely genotyped reference population. These imputed genotypes would then be used with a prediction equation to produce genomic estimated breeding values. In the future, it may also be desirable to impute very dense marker genotypes or even whole genome re‐sequence data from moderate density SNP panels. Such a strategy could lead to an accurate prediction of genomic estimated breeding values across breeds, for example. We used genotypes from 48 640 (50K) SNPs genotyped in four sheep breeds to investigate both the accuracy of imputation of the 50K SNPs from low density SNP panels, as well as prospects for imputing very dense or whole genome re‐sequence data from the 50K SNPs (by leaving out a small number of the 50K SNPs at random). Accuracy of imputation was low if the sparse panel had less than 5000 (5K) markers. Across breeds, it was clear that the accuracy of imputing from sparse marker panels to 50K was higher if the genetic diversity within a breed was lower, such that relationships among animals in that breed were higher. The accuracy of imputation from sparse genotypes to 50K genotypes was higher when the imputation was performed within breed rather than when pooling all the data, despite the fact that the pooled reference set was much larger. For Border Leicesters, Poll Dorsets and White Suffolks, 5K sparse genotypes were sufficient to impute 50K with 80% accuracy. For Merinos, the accuracy of imputing 50K from 5K was lower at 71%, despite a large number of animals with full genotypes (2215) being used as a reference. For all breeds, the relationship of individuals to the reference explained up to 64% of the variation in accuracy of imputation, demonstrating that accuracy of imputation can be increased if sires and other ancestors of the individuals to be imputed are included in the reference population. The accuracy of imputation could also be increased if pedigree information was available and was used in tracking inheritance of large chromosome segments within families. In our study, we only considered methods of imputation based on population‐wide linkage disequilibrium (largely because the pedigree for some of the populations was incomplete). Finally, in the scenarios designed to mimic imputation of high density or whole genome re‐sequence data from the 50K panel, the accuracy of imputation was much higher (86–96%). This is promising, suggesting that in silico genome re‐sequencing is possible in sheep if a suitable pool of key ancestors is sequenced for each breed.  相似文献   

2.
3.
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.  相似文献   

4.
Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population.  相似文献   

5.
6.
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low‐density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome‐wide association analysis and genomic prediction in Labrador Retrievers.  相似文献   

7.
基因型和环境对小麦主要品质性状参数的影响   总被引:13,自引:0,他引:13  
利用8个冬小麦品种(系)于2002年种植在8个不同地点的试验结果,分析了品种(系)、环境以及品种(系)与环境的互作对谷蛋白大聚合体(GMP)及其组成、面团揉混仪参数及烘烤品质等主要品质性状的影响。结果表明,基因型对GMP、高、低分子量谷蛋白亚基有显著影响,说明GMP及其组成主要受基因型控制;沉淀值、峰值时间(MPT)、8min带宽(8TW)受环境影响程度比基因型小;而品种、环境及其互作对面包体积都有显著影响。小麦品质性状间的相关系数受环境条件的影响,不同地点品质性状问的相关系数不同。品种(系)和地点的互作效应在同一品种不同地点间是不同的,即使在不利的环境下,也有表现好的品种(系)。综合考虑对烘烤品质的影响,烟台点和济麦20表现最好。因此,进行品质评价时,不同地点间不仅考虑蛋白质含量的变化,还要考虑蛋白质质量、GMP及其组成、沉淀值、中线峰值时间以及8min带宽的变化规律。  相似文献   

8.
9.
目的: 筛选高血压性心脏病(HHD)的影响因素,建立HHD的预测模型,为HHD的发生提供预警。方法: 选取中国重庆市某医科院校数据研究院平台2016年1月1日至2019年12月31日主要诊断为高血压性心脏病和高血压患者。通过单因素分析、多因素分析筛选HHD的影响因素,采用R语言分别构建Logistics模型、随机森林(RF)模型和极限梯度上升(XGBoost)模型。结果: 单因素分析筛选出60项差异指标,多因素分析筛选出18项差异指标(P<0.05)。Logistics模型、RF模型、XGBoost模型曲线下面积(AUC)分别为0.979、0.983和0.990。结论: 本文建立的3种HHD预测模型结果稳定,其中XGBoost模型对于HHD的发生具有良好的诊断效应。  相似文献   

10.
Latent class regression (LCR) is a popular method for analyzing multiple categorical outcomes. While nonresponse to the manifest items is a common complication, inferences of LCR can be evaluated using maximum likelihood, multiple imputation, and two‐stage multiple imputation. Under similar missing data assumptions, the estimates and variances from all three procedures are quite close. However, multiple imputation and two‐stage multiple imputation can provide additional information: estimates for the rates of missing information. The methodology is illustrated using an example from a study on racial and ethnic disparities in breast cancer severity.  相似文献   

11.
Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo , can be used with confidence by researchers, both for single and multilevel multiple imputation.  相似文献   

12.
Evidence synthesis, both qualitatively and quantitatively through meta-analysis, is central to the development of evidence-based medicine. Unfortunately, meta-analysis is often complicated by the suspicion that the available studies represent a biased subset of the evidence, possibly due to publication bias or other systematically different effects in small studies. A number of statistical methods have been proposed to address this, among which the trim-and-fill method and the Copas selection model are two of the most widely discussed. However, both methods have drawbacks: the trim-and-fill method is based on strong assumptions about the symmetry of the funnel plot; the Copas selection model is less accessible to systematic reviewers, and sometimes encounters estimation problems. In this article, we adopt a logistic selection model, and show how treatment effects can be rapidly estimated via multiple imputation. Specifically, we impute studies under a missing at random assumption, and then reweight to obtain estimates under nonrandom selection. Our proposal is computationally straightforward. It allows users to increase selection while monitoring the extent of remaining funnel plot asymmetry, and also visualize the results using the funnel plot. We illustrate our approach using a small meta-analysis of benign prostatic hyperplasia.  相似文献   

13.
Current manufacturing and development processes for therapeutic monoclonal antibodies demand increasing volumes of analytical testing for both real-time process controls and high-throughput process development. The feasibility of using Raman spectroscopy as an in-line product quality measuring tool has been recently demonstrated and promises to relieve this analytical bottleneck. Here, we resolve time-consuming calibration process that requires fractionation and preparative experiments covering variations of product quality attributes (PQAs) by engineering an automation system capable of collecting Raman spectra on the order of hundreds of calibration points from two to three stock seed solutions differing in protein concentration and aggregate level using controlled mixing. We used this automated system to calibrate multi-PQA models that accurately measured product concentration and aggregation every 9.3 s using an in-line flow-cell. We demonstrate the application of a nonlinear calibration model for monitoring product quality in real-time during a biopharmaceutical purification process intended for clinical and commercial manufacturing. These results demonstrate potential feasibility to implement quality monitoring during GGMP manufacturing as well as to increase chemistry, manufacturing, and controls understanding during process development, ultimately leading to more robust and controlled manufacturing processes.  相似文献   

14.
Various types of unwanted and uncontrollable signal variations in MS‐based metabolomics and proteomics datasets severely disturb the accuracies of metabolite and protein profiling. Therefore, pooled quality control (QC) samples are often employed in quality management processes, which are indispensable to the success of metabolomics and proteomics experiments, especially in high‐throughput cases and long‐term projects. However, data consistency and QC sample stability are still difficult to guarantee because of the experimental operation complexity and differences between experimenters. To make things worse, numerous proteomics projects do not take QC samples into consideration at the beginning of experimental design. Herein, a powerful and interactive web‐based software, named pseudoQC, is presented to simulate QC sample data for actual metabolomics and proteomics datasets using four different machine learning‐based regression methods. The simulated data are used for correction and normalization of the two published datasets, and the obtained results suggest that nonlinear regression methods perform better than linear ones. Additionally, the above software is available as a web‐based graphical user interface and can be utilized by scientists without a bioinformatics background. pseudoQC is open‐source software and freely available at https://www.omicsolution.org/wukong/pseudoQC/ .  相似文献   

15.
Phylogenetic comparative methods (PCMs) can be used to study evolutionary relationships and trade-offs among species traits. Analysts using PCM may want to (1) include latent variables, (2) estimate complex trait interdependencies, (3) predict missing trait values, (4) condition predicted traits upon phylogenetic correlations and (5) estimate relationships as slope parameters that can be compared with alternative regression methods. The Comprehensive R Archive Network (CRAN) includes well-documented software for phylogenetic linear models (phylolm), phylogenetic path analysis (phylopath), phylogenetic trait imputation (Rphylopars) and structural equation models (sem), but none of these can simultaneously accomplish all five analytical goals. We therefore introduce a new package phylosem for phylogenetic structural equation models (PSEM) and summarize features and interface. We also describe new analytical options, where users can specify any combination of Ornstein-Uhlenbeck, Pagel's-δ and Pagel's-λ transformations for species covariance. For the first time, we show that PSEM exactly reproduces estimates (and standard errors) for simplified cases that are feasible in sem, phylopath, phylolm and Rphylopars and demonstrate the approach by replicating a well-known case study involving trade-offs in plant energy budgets.  相似文献   

16.
In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as "predictive inference" in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.  相似文献   

17.
We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose three multiple imputation strategies to handle missing values when generalizing treatment effects, each handling the multisource structure of the problem differently (separate imputation, joint imputation with fixed effect, joint imputation ignoring source information). As an alternative to multiple imputation, we also propose a direct estimation approach that treats incomplete covariates as semidiscrete variables. The multiple imputation strategies and the latter alternative rely on different sets of assumptions concerning the impact of missing values on identifiability. We discuss these assumptions and assess the methods through an extensive simulation study. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and an RCT studying the effect of tranexamic acid administration on mortality in major trauma patients admitted to intensive care units. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.  相似文献   

18.
The objective was to evaluate the potential use of genotype probabilities to handle records of non-genotyped animals in the context of survival analysis. To do so, the risks associated with the PrP genotype and other transmission factors in relation to clinical scrapie were estimated. Data from 4049 Romanov sheep affected by natural scrapie were analyzed using survival analysis techniques. The original data set included 1310 animals with missing genotypes; five of those had uncensored records. Different missing genotype-information patterns were simulated for uncensored and censored records. Three strategies differing in the way genotype information was handled were tested. Firstly, records with unknown genotypes were discarded (P1); secondly, those records were grouped in an unknown class (P2). Finally the probabilities of genotypes were assigned (P3). Whatever the strategy, the ranking of relative risks for the most susceptible genotypes (VRQ-VRQ, ARQ-VRQ and ARQ-ARQ) was similar even when the non-genotyped animals were not a negligible part of uncensored records. However, P3 had a more efficient way of handling missing genotype information. As compared to P1, either P2 or P3 avoided discarding the records of non-genotyped animals; however, P3 eliminated the unknown class and the risk associated with this group. Genotype probabilities were shown to be a useful technique to handle records of individuals with unknown genotype.  相似文献   

19.
According to the multi-parameter evaluation of groundwater quality, an evaluation model of groundwater quality based on the improved Extreme Learning Machine (ELM) was proposed to resolve fuzziness of the water quality evaluation and incompatibility of water parameters. A training sample set and testing sample set were randomly generated according to the classification standards of groundwater quality, then Crow Search Algorithm (CSA) was used to optimize the input weights and thresholds of hidden-layer neurons of the ELM; thus, the CSA-ELM evaluation model of groundwater quality was constructed based on optimization of the ELM by the CSA. Base on the training sample set and testing sample set, the CSA-ELM model was tested. The test results indicate that the evaluating precision and generalization ability of the CSA-ELM model reach a high level and can be used for comprehensive evaluations of groundwater quality. The Jiansanjiang Administration in Heilongjiang Province, China, was used as an example; the groundwater quality of 15 farms in this region was evaluated based on the CSA-ELM model. The groundwater quality in this region was generally good, and the groundwater quality appeared to have spatial distribution characteristics. Compared with the Nemerow Index Method (NIM), the CSA-ELM evaluation model of groundwater quality is more reasonable and can be used for the comprehensive evaluation of groundwater quality. The stability of the NIM, ELM model, back propagation (BP) model and CSA-ELM model was analyzed using the theory of serial number summation and Spearman's correlation coefficient. The stability of the NIM and BP model in groundwater quality evaluation was poor, while the stability of the ELM model and CSA-ELM model was relatively superior. The ranked results of stability are CSA-ELM model > ELM model > NIM > BP model. The reliability of the NIM, ELM model, BP model and CSA-ELM model was analyzed using the theory of distinction degree. The reliability of the NIM was not good, although its distinction degree was large; the distinction degrees of the ELM model, BP model and CSA-ELM model were close to each other. The ranked results of reliability are CSA-ELM model > ELM model > BP model. The CSA-ELM model can provide a stable and reliable evaluation method for the evaluation of related fields and thus has important practical applicability.  相似文献   

20.
小麦品种蛋白质品质性状稳定性研究   总被引:16,自引:2,他引:16  
用陕西省关中小麦品种区域试验所选用的12个小麦品种(品系)在12个试点的数据资料。分析了品种,环境及品种与环境互作(CEI)对籽粒硬度,蛋白质含量,沉淀值及湿面筋含量的影响。结果表明:基因型效应对所有品质参数均有显著影响,基因型与环境互作对沉淀值影响较大,而对籽粒硬度,蛋白质含量与湿面筋含量影响较小,环境效应对湿面筋含量和好粒硬度影响较大,而对蛋白质含量与沉淀值影响较小,蛋白质品质参数的回归系数(b值)表明,基因型对不同环境的反应存在着显著差异。对于籽粒硬度,蛋白质含量与湿面筋含量回归偏差显著的品种很少,这表明线性回归模式占了基因型变异的绝大部分,一些品种的沉淀值显著偏离了回归,上述结果表明,要改善小麦品种的蛋白质品质,也应重视环境对小麦蛋白质品质的影响。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号