期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

共查询到20条相似文献，搜索用时 203 毫秒

A computationally efficient algorithm for genomic prediction using a Bayesian model

Tingting Wang Yi-Ping Phoebe Chen Michael E Goddard Theo HE Meuwissen Kathryn E Kemper Ben J Hayes 《遗传、选种与进化》2015,47(1)

Background

Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation–maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time.

Methods

emBayesR is an approximate EM algorithm that retains the BayesR model assumption with SNP effects sampled from a mixture of normal distributions with increasing variance. emBayesR differs from other proposed non-MCMC implementations of Bayesian methods for genomic prediction in that it estimates the effect of each SNP while allowing for the error associated with estimation of all other SNP effects. emBayesR was compared to BayesR using simulated data, and real dairy cattle data with 632 003 SNPs genotyped, to determine if the MCMC and the expectation-maximisation approaches give similar accuracies of genomic prediction.

Results

We were able to demonstrate that allowing for the error associated with estimation of other SNP effects when estimating the effect of each SNP in emBayesR improved the accuracy of genomic prediction over emBayesR without including this error correction, with both simulated and real data. When averaged over nine dairy traits, the accuracy of genomic prediction with emBayesR was only 0.5% lower than that from BayesR. However, emBayesR reduced computing time up to 8-fold compared to BayesR.

Conclusions

The emBayesR algorithm described here achieved similar accuracies of genomic prediction to BayesR for a range of simulated and real 630 K dairy SNP data. emBayesR needs less computing time than BayesR, which will allow it to be applied to larger datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0082-4) contains supplementary material, which is available to authorized users. 相似文献

Genomic prediction based on data from three layer lines: a comparison between linear methods

Mario PL Calus Heyun Huang Addie Vereijken Jeroen Visscher Jan ten Napel Jack J Windig 《遗传、选种与进化》2014,46(1)

Background

The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction.

Methods

Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we evaluated the following genomic prediction models: genome-enabled BLUP (GBLUP), ridge regression BLUP (RRBLUP), principal component analysis followed by ridge regression (RRPCA), BayesC and Bayesian stochastic search variable selection. Prediction accuracy was measured as the correlation between predicted breeding values and observed phenotypes divided by the square root of the heritability. The data used concerned laying hens with phenotypes for number of eggs in the first production period and known genotypes. The hens were from two closely-related brown layer lines (B1 and B2), and a third distantly-related white layer line (W1). Lines had 1004 to 1023 training animals and 238 to 240 validation animals. Training datasets consisted of animals of either single lines, or a combination of two or all three lines, and had 30 508 to 45 974 segregating single nucleotide polymorphisms.

Results

Genomic prediction models yielded 0.13 to 0.16 higher accuracies than pedigree-based BLUP. When excluding the line itself from the training dataset, genomic predictions were generally inaccurate. Use of multiple lines marginally improved prediction accuracy for B2 but did not affect or slightly decreased prediction accuracy for B1 and W1. Differences between models were generally small except for RRPCA which gave considerably higher accuracies for B2. Correlations between genomic predictions from different methods were higher than 0.96 for W1 and higher than 0.88 for B1 and B2. The greater differences between methods for B1 and B2 were probably due to the lower accuracy of predictions for B1 (~0.45) and B2 (~0.40) compared to W1 (~0.76).

Conclusions

Multi-line genomic prediction did not affect or slightly improved prediction accuracy for closely-related lines. For distantly-related lines, multi-line genomic prediction yielded similar or slightly lower accuracies than single-line genomic prediction. Bayesian variable selection and GBLUP generally gave similar accuracies. Overall, RRPCA yielded the greatest accuracies for two lines, suggesting that using PCA helps to alleviate the “n ≪ p” problem in genomic prediction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0057-5) contains supplementary material, which is available to authorized users. 相似文献

GVCBLUP: a computer package for genomic prediction and variance component estimation of additive and dominance effects

Chunkao Wang Dzianis Prakapenka Shengwen Wang Sujata Pulugurta Hakizumwami Birali Runesha Yang Da 《BMC bioinformatics》2014,15(1)

Background

Dominance effect may play an important role in genetic variation of complex traits. Full featured and easy-to-use computing tools for genomic prediction and variance component estimation of additive and dominance effects using genome-wide single nucleotide polymorphism (SNP) markers are necessary to understand dominance contribution to a complex trait and to utilize dominance for selecting individuals with favorable genetic potential.

Results

The GVCBLUP package is a shared memory parallel computing tool for genomic prediction and variance component estimation of additive and dominance effects using genome-wide SNP markers. This package currently has three main programs (GREML_CE, GREML_QM, and GCORRMX) and a graphical user interface (GUI) that integrates the three main programs with an existing program for the graphical viewing of SNP additive and dominance effects (GVCeasy). The GREML_CE and GREML_QM programs offer complementary computing advantages with identical results for genomic prediction of breeding values, dominance deviations and genotypic values, and for genomic estimation of additive and dominance variances and heritabilities using a combination of expectation-maximization (EM) algorithm and average information restricted maximum likelihood (AI-REML) algorithm. GREML_CE is designed for large numbers of SNP markers and GREML_QM for large numbers of individuals. Test results showed that GREML_CE could analyze 50,000 individuals with 400 K SNP markers and GREML_QM could analyze 100,000 individuals with 50K SNP markers. GCORRMX calculates genomic additive and dominance relationship matrices using SNP markers. GVCeasy is the GUI for GVCBLUP integrated with an existing software tool for the graphical viewing of SNP effects and a function for editing the parameter files for the three main programs.

Conclusion

The GVCBLUP package is a powerful and versatile computing tool for assessing the type and magnitude of genetic effects affecting a phenotype by estimating whole-genome additive and dominance heritabilities, for genomic prediction of breeding values, dominance deviations and genotypic values, for calculating genomic relationships, and for research and education in genomic prediction and estimation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-270) contains supplementary material, which is available to authorized users. 相似文献

Generalized Seasonal Autoregressive Integrated Moving Average Models for Count Data with Application to Malaria Time Series with Low Case Numbers

Olivier J. T. Bri?t Priyanie H. Amerasinghe Penelope Vounatsou 《PloS one》2013,8(6)

Introduction

With the renewed drive towards malaria elimination, there is a need for improved surveillance tools. While time series analysis is an important tool for surveillance, prediction and for measuring interventions’ impact, approximations by commonly used Gaussian methods are prone to inaccuracies when case counts are low. Therefore, statistical methods appropriate for count data are required, especially during “consolidation” and “pre-elimination” phases.

Methods

Generalized autoregressive moving average (GARMA) models were extended to generalized seasonal autoregressive integrated moving average (GSARIMA) models for parsimonious observation-driven modelling of non Gaussian, non stationary and/or seasonal time series of count data. The models were applied to monthly malaria case time series in a district in Sri Lanka, where malaria has decreased dramatically in recent years.

Results

The malaria series showed long-term changes in the mean, unstable variance and seasonality. After fitting negative-binomial Bayesian models, both a GSARIMA and a GARIMA deterministic seasonality model were selected based on different criteria. Posterior predictive distributions indicated that negative-binomial models provided better predictions than Gaussian models, especially when counts were low. The G(S)ARIMA models were able to capture the autocorrelation in the series.

Conclusions

G(S)ARIMA models may be particularly useful in the drive towards malaria elimination, since episode count series are often seasonal and non-stationary, especially when control is increased. Although building and fitting GSARIMA models is laborious, they may provide more realistic prediction distributions than do Gaussian methods and may be more suitable when counts are low. 相似文献

Memory Concerns,Memory Performance and Risk of Dementia in Patients with Mild Cognitive Impairment

Steffen Wolfsgruber Michael Wagner Klaus Schmidtke Lutz Fr?lich Alexander Kurz Stefanie Schulz Harald Hampel Isabella Heuser Oliver Peters Friedel M. Reischies Holger Jahn Christian Luckhaus Michael Hüll Hermann-Josef Gertz Johannes Schr?der Johannes Pantel Otto Rienhoff Eckart Rüther Fritz Henn Jens Wiltfang Wolfgang Maier Johannes Kornhuber Frank Jessen 《PloS one》2014,9(7)

Background

Concerns about worsening memory (“memory concerns”; MC) and impairment in memory performance are both predictors of Alzheimer''s dementia (AD). The relationship of both in dementia prediction at the pre-dementia disease stage, however, is not well explored. Refined understanding of the contribution of both MC and memory performance in dementia prediction is crucial for defining at-risk populations. We examined the risk of incident AD by MC and memory performance in patients with mild cognitive impairment (MCI).

Methods

We analyzed data of 417 MCI patients from a longitudinal multicenter observational study. Patients were classified based on presence (n = 305) vs. absence (n = 112) of MC. Risk of incident AD was estimated with Cox Proportional-Hazards regression models.

Results

Risk of incident AD was increased by MC (HR = 2.55, 95%CI: 1.33–4.89), lower memory performance (HR = 0.63, 95%CI: 0.56–0.71) and ApoE4-genotype (HR = 1.89, 95%CI: 1.18–3.02). An interaction effect between MC and memory performance was observed. The predictive power of MC was greatest for patients with very mild memory impairment and decreased with increasing memory impairment.

Conclusions

Our data suggest that the power of MC as a predictor of future dementia at the MCI stage varies with the patients'' level of cognitive impairment. While MC are predictive at early stage MCI, their predictive value at more advanced stages of MCI is reduced. This suggests that loss of insight related to AD may occur at the late stage of MCI. 相似文献

Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios

Hsueh-Yi Lu Chen-Yuan Huang Chwen-Tzeng Su Chen-Chiang Lin 《PloS one》2014,9(4)

Objectives

Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.

Methods

In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.

Results

Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).

Conclusions

Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. 相似文献

Gastro-Esophageal Reflux Disease Symptoms and Demographic Factors as a Pre-Screening Tool for Barrett’s Esophagus

Xinxue Liu Angela Wong Sudarshan R. Kadri Andrej Corovic Maria O’Donovan Pierre Lao-Sirieix Laurence B. Lovat Rodney W. Burnham Rebecca C. Fitzgerald 《PloS one》2014,9(4)

Background

Barrett’s esophagus (BE) occurs as consequence of reflux and is a risk factor for esophageal adenocarcinoma. The current “gold-standard” for diagnosing BE is endoscopy which remains prohibitively expensive and impractical as a population screening tool. We aimed to develop a pre-screening tool to aid decision making for diagnostic referrals.

Methodology/Principal Findings

A prospective (training) cohort of 1603 patients attending for endoscopy was used for identification of risk factors to develop a risk prediction model. Factors associated with BE in the univariate analysis were selected to develop prediction models that were validated in an independent, external cohort of 477 non-BE patients referred for endoscopy with symptoms of reflux or dyspepsia. Two prediction models were developed separately for columnar lined epithelium (CLE) of any length and using a stricter definition of intestinal metaplasia (IM) with segments ≥2 cm with areas under the ROC curves (AUC) of 0.72 (95%CI: 0.67–0.77) and 0.81 (95%CI: 0.76–0.86), respectively. The two prediction models included demographics (age, sex), symptoms (heartburn, acid reflux, chest pain, abdominal pain) and medication for “stomach” symptoms. These two models were validated in the independent cohort with AUCs of 0.61 (95%CI: 0.54–0.68) and 0.64 (95%CI: 0.52–0.77) for CLE and IM≥2 cm, respectively.

Conclusions

We have identified and validated two prediction models for CLE and IM≥2 cm. Both models have fair prediction accuracies and can select out around 20% of individuals unlikely to benefit from investigation for Barrett’s esophagus. Such prediction models have the potential to generate useful cost-savings for BE screening among the symptomatic population. 相似文献

Hemoglobin and Hematocrit Levels in the Prediction of Complicated Crohn's Disease Behavior – A Cohort Study

Florian Rieder Gisela Paul Elisabeth Schnoy Stephan Schleder Alexandra Wolf Florian Kamm Andrea Dirmeier Ulrike Strauch Florian Obermeier Rocio Lopez Jean-Paul Achkar Gerhard Rogler Frank Klebl 《PloS one》2014,9(8)

Background

Markers that predict the occurrence of a complicated disease behavior in patients with Crohn''s disease (CD) can permit a more aggressive therapeutic regimen for patients at risk. The aim of this cohort study was to test the blood levels of hemoglobin (Hgb) and hematocrit (Hct) for the prediction of complicated CD behavior and CD related surgery in an adult patient population.

Methods

Blood samples of 62 CD patients of the German Inflammatory Bowel Disease-network “Kompetenznetz CED” were tested for the levels of Hgb and Hct prior to the occurrence of complicated disease behavior or CD related surgery. The relation of these markers and clinical events was studied using Kaplan-Meier survival analysis and adjusted COX-proportional hazard regression models.

Results

The median follow-up time was 55.8 months. Of the 62 CD patients without any previous complication or surgery 34% developed a complication and/or underwent CD related surgery. Low Hgb or Hct levels were independent predictors of a shorter time to occurrence of the first complication or CD related surgery. This was true for early as well as late occurring complications. Stable low Hgb or Hct during serial follow-up measurements had a higher frequency of complications compared to patients with a stable normal Hgb or Hct, respectively.

Conclusions

Determination of Hgb or Hct in complication and surgery naïve CD patients might serve as an additional tool for the prediction of complicated disease behavior. 相似文献

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

Minchao Wang Wu Zhang Wang Ding Dongbo Dai Huiran Zhang Hao Xie Luonan Chen Yike Guo Jiang Xie 《PloS one》2014,9(4)

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. 相似文献

10.

The effect of using genealogy-based haplotypes for genomic prediction

Vahid Edriss Rohan L Fernando Guosheng Su Mogens S Lund Bernt Guldbrandtsen 《遗传、选种与进化》2013,45(1):5

Background

Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information.

Methods

A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method.

Results

About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers.

Conclusions

Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. 相似文献

11.

Genomic relationships based on X chromosome markers and accuracy of genomic predictions with and without X chromosome markers

Guosheng Su Bernt Guldbrandtsen Gert P Aamand Ismo Strandén Mogens S Lund 《遗传、选种与进化》2014,46(1):47

Background

Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers.

Methods

The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits.

Results

Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers.

Conclusions

The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation. 相似文献

12.

A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses

Rohan L Fernando Jack CM Dekkers Dorian J Garrick 《遗传、选种与进化》2014,46(1)

Background

To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase.

Methods

A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores.

Discussion

In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1297-9686-46-50) contains supplementary material, which is available to authorized users. 相似文献

13.

Autism-Specific Covariation in Perceptual Performances: “g” or “p” Factor?

Andrée-Anne S. Meilleur Claude Berthiaume Armando Bertone Laurent Mottron 《PloS one》2014,9(8)

Background

Autistic perception is characterized by atypical and sometimes exceptional performance in several low- (e.g., discrimination) and mid-level (e.g., pattern matching) tasks in both visual and auditory domains. A factor that specifically affects perceptive abilities in autistic individuals should manifest as an autism-specific association between perceptual tasks. The first purpose of this study was to explore how perceptual performances are associated within or across processing levels and/or modalities. The second purpose was to determine if general intelligence, the major factor that accounts for covariation in task performances in non-autistic individuals, equally controls perceptual abilities in autistic individuals.

Methods

We asked 46 autistic individuals and 46 typically developing controls to perform four tasks measuring low- or mid-level visual or auditory processing. Intelligence was measured with the Wechsler''s Intelligence Scale (FSIQ) and Raven Progressive Matrices (RPM). We conducted linear regression models to compare task performances between groups and patterns of covariation between tasks. The addition of either Wechsler''s FSIQ or RPM in the regression models controlled for the effects of intelligence.

Results

In typically developing individuals, most perceptual tasks were associated with intelligence measured either by RPM or Wechsler FSIQ. The residual covariation between unimodal tasks, i.e. covariation not explained by intelligence, could be explained by a modality-specific factor. In the autistic group, residual covariation revealed the presence of a plurimodal factor specific to autism.

Conclusions

Autistic individuals show exceptional performance in some perceptual tasks. Here, we demonstrate the existence of specific, plurimodal covariation that does not dependent on general intelligence (or “g” factor). Instead, this residual covariation is accounted for by a common perceptual process (or “p” factor), which may drive perceptual abilities differently in autistic and non-autistic individuals. 相似文献

14.

Distance-based assessment of the localization of functional annotations in 3D genome reconstructions

Daniel Capurso Mark R Segal 《BMC genomics》2014,15(1)

Background

Recent studies used the contact data or three-dimensional (3D) genome reconstructions from Hi-C (chromosome conformation capture with next-generation sequencing) to assess the co-localization of functional genomic annotations in the nucleus. These analyses dichotomized data point pairs belonging to a functional annotation as “close” or “far” based on some threshold and then tested for enrichment of “close” pairs. We propose an alternative approach that avoids dichotomization of the data and instead directly estimates the significance of distances within the 3D reconstruction.

Results

We applied this approach to 3D genome reconstructions for Plasmodium falciparum, the causative agent of malaria, and Saccharomyces cerevisiae and compared the results to previous approaches. We found significant 3D co-localization of centromeres, telomeres, virulence genes, and several sets of genes with developmentally regulated expression in P. falciparum; and significant 3D co-localization of centromeres and long terminal repeats in S. cerevisiae. Additionally, we tested the experimental observation that telomeres form three to seven clusters in P. falciparum and S. cerevisiae. Applying affinity propagation clustering to telomere coordinates in the 3D reconstructions yielded six telomere clusters for both organisms.

Conclusions

Distance-based assessment replicated key findings, while avoiding dichotomization of the data (which previously yielded threshold-sensitive results).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-992) contains supplementary material, which is available to authorized users. 相似文献

15.

Development of a Melanoma Risk Prediction Model Incorporating MC1R Genotype and Indoor Tanning Exposure: Impact of Mole Phenotype on Model Performance

Lauren A. Penn Meng Qian Enhan Zhang Elise Ng Yongzhao Shao Marianne Berwick DeAnn Lazovich David Polsky 《PloS one》2014,9(7)

Background

Identifying individuals at increased risk for melanoma could potentially improve public health through targeted surveillance and early detection. Studies have separately demonstrated significant associations between melanoma risk, melanocortin receptor (MC1R) polymorphisms, and indoor ultraviolet light (UV) exposure. Existing melanoma risk prediction models do not include these factors; therefore, we investigated their potential to improve the performance of a risk model.

Methods

Using 875 melanoma cases and 765 controls from the population-based Minnesota Skin Health Study we compared the predictive ability of a clinical melanoma risk model (Model A) to an enhanced model (Model F) using receiver operating characteristic (ROC) curves. Model A used self-reported conventional risk factors including mole phenotype categorized as “none”, “few”, “some” or “many” moles. Model F added MC1R genotype and measures of indoor and outdoor UV exposure to Model A. We also assessed the predictive ability of these models in subgroups stratified by mole phenotype (e.g. nevus-resistant (“none” and “few” moles) and nevus-prone (“some” and “many” moles)).

Results

Model A (the reference model) yielded an area under the ROC curve (AUC) of 0.72 (95% CI = 0.69, 0.74). Model F was improved with an AUC = 0.74 (95% CI = 0.71–0.76, p<0.01). We also observed substantial variations in the AUCs of Models A & F when examined in the nevus-prone and nevus-resistant subgroups.

Conclusions

These results demonstrate that adding genotypic information and environmental exposure data can increase the predictive ability of a clinical melanoma risk model, especially among nevus-prone individuals. 相似文献

16.

Identification of copy number variants from exome sequence data

Pubudu Saneth Samarakoon Hanne S?rmo Sorte Bj?rn Evert Kristiansen Tove Skodje Ying Sheng Geir E Tj?nnfjord Barbro Stadheim Asbj?rg Stray-Pedersen Olaug Kristin R?dningen Robert Lyle 《BMC genomics》2014,15(1)

Background

With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array.

Results

We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate.

Conclusions

In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-661) contains supplementary material, which is available to authorized users. 相似文献

17.

Genomic prediction based on data from three layer lines using non-linear regression models

Heyun Huang Jack J Windig Addie Vereijken Mario PL Calus 《遗传、选种与进化》2014,46(1)

Background

Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods.

Methods

In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values.

Results

When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.

Conclusions

Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0075-3) contains supplementary material, which is available to authorized users. 相似文献

18.

Deficits in Executive and Memory Processes in Delusional Disorder: A Case-Control Study

Inmaculada Ibanez-Casas Enrique De Portugal Nieves Gonzalez Kathryn A. McKenney Josep M. Haro Judith Usall Miguel Perez-Garcia Jorge A. Cervilla 《PloS one》2013,8(7)

Objective

Delusional disorder has been traditionally considered a psychotic syndrome that does not evolve to cognitive deterioration. However, to date, very little empirical research has been done to explore cognitive executive components and memory processes in Delusional Disorder patients. This study will investigate whether patients with delusional disorder are intact in both executive function components (such as flexibility, impulsivity and updating components) and memory processes (such as immediate, short term and long term recall, learning and recognition).

Methods

A large sample of patients with delusional disorder (n = 86) and a group of healthy controls (n = 343) were compared with regard to their performance in a broad battery of neuropsychological tests including Trail Making Test, Wisconsin Card Sorting Test, Colour-Word Stroop Test, and Complutense Verbal Learning Test (TAVEC).

Results

When compared to controls, cases of delusional disorder showed a significantly poorer performance in most cognitive tests. Thus, we demonstrate deficits in flexibility, impulsivity and updating components of executive functions as well as in memory processes. These findings held significant after taking into account sex, age, educational level and premorbid IQ.

Conclusions

Our results do not support the traditional notion of patients with delusional disorder being cognitively intact. 相似文献

19.

Quilt Plots: A Simple Tool for the Visualisation of Large Epidemiological Data

Handan Wand Jenny Iversen Matthew Law Lisa Maher 《PloS one》2014,9(1)

Background

Graphical representation of data is one of the most easily comprehended forms of explanation. The current study describes a simple visualization tool which may allow greater understanding of medical and epidemiological data.

Method

We propose a simple tool for visualization of data, known as a “quilt plot”, that provides an alternative to presenting large volumes of data as frequency tables. Data from the Australian Needle and Syringe Program survey are used to illustrate “quilt plots”.

Conclusion

Visualization of large volumes of data using “quilt plots” enhances interpretation of medical and epidemiological data. Such intuitive presentations are particularly useful for the rapid assessment of problems in the data which cannot be readily identified by manual review. We recommend that, where possible, “quilt plots” be used along with traditional quantitative assessments of the data as an explanatory data analysis tool. 相似文献

20.

A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation

John M Hickey Brian P Kinghorn Bruce Tier Julius HJ van der Werf Matthew A Cleveland 《遗传、选种与进化》2012,44(1):9

Background

Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation.

Methods

An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis.

Results

Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored.

Conclusions

The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations. 相似文献