期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression 总被引：2，自引：0，他引：2

He J Zelikovsky A 《Bioinformatics (Oxford, England)》2006,22(20):2558-2561

The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or (2) necessary reduction of the huge SNP sets (obtained, e.g. from Affymetrix) for further fine haplotype analysis. A novel informative SNP selection method for unphased genotype data based on multiple linear regression (MLR) is implemented in the software package MLR-tagging. This software can be used for informative SNP (tag) selection and genotype prediction. The stepwise tag selection algorithm (STSA) selects positions of the given number of informative SNPs based on a genotype sample population. The MLR SNP prediction algorithm predicts a complete genotype based on the values of its informative SNPs, their positions among all SNPs, and a sample of complete genotypes. An extensive experimental study on various datasets including 10 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. (2005). AVAILABILITY: MLR-Tagging software package is publicly available at http://alla.cs.gsu.edu/~software/tagging/tagging.html 相似文献

2.

The impact of missing and erroneous genotypes on tagging SNP selection and power of subsequent association tests

Liu W Zhao W Chase GA 《Human heredity》2006,61(1):31-44

OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests. 相似文献

3.

Choosing SNPs using feature selection

Phuong TM Lin Z Altman RB 《Journal of bioinformatics and computational biology》2006,4(2):241-257

A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNPs). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as "tagging" SNPs, able to capture most variation in a population. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper, we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally redundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based methods. Supplementary website: http://htsnp.stanford.edu/FSFS/. 相似文献

4.

Selection of haplotype variables from a high-density marker map for genomic prediction

Beatriz CD Cuyabano Guosheng Su Mogens S. Lund 《遗传、选种与进化》2015,47(1)

Background

Using haplotype blocks as predictors rather than individual single nucleotide polymorphisms (SNPs) may improve genomic predictions, since haplotypes are in stronger linkage disequilibrium with the quantitative trait loci than are individual SNPs. It has also been hypothesized that an appropriate selection of a subset of haplotype blocks can result in similar or better predictive ability than when using the whole set of haplotype blocks. This study investigated genomic prediction using a set of haplotype blocks that contained the SNPs with large effects estimated from an individual SNP prediction model. We analyzed protein yield, fertility and mastitis of Nordic Holstein cattle, and used high-density markers (about 770k SNPs). To reach an optimum number of haplotype variables for genomic prediction, predictions were performed using subsets of haplotype blocks that contained a range of 1000 to 50 000 main SNPs.

Results

The use of haplotype blocks improved the prediction reliabilities, even when selection focused on only a group of haplotype blocks. In this case, the use of haplotype blocks that contained the 20 000 to 50 000 SNPs with the highest effect was sufficient to outperform the model that used all individual SNPs as predictors (up to 1.3 % improvement in prediction reliability for mastitis, compared to individual SNP approach), and the achieved reliabilities were similar to those using all haplotype blocks available in the genome data (from 0.6 % lower to 0.8 % higher reliability).

Conclusions

Haplotype blocks used as predictors can improve the reliability of genomic prediction compared to the individual SNP model. Furthermore, the use of a subset of haplotype blocks that contains the main SNP effects from genomic data could be a feasible approach to genomic prediction in dairy cattle, given an increase in density of genotype data available. The predictive ability of the models that use a subset of haplotype blocks was similar to that obtained using either all haplotype blocks or all individual SNPs, with the benefit of having a much lower computational demand. 相似文献

5.

Selection of genetic markers for association analyses,using linkage disequilibrium and haplotypes

下载免费PDF全文

Meng Z Zaykin DV Xu CF Wagner M Ehm MG 《American journal of human genetics》2003,73(1):115-130

The genotyping of closely spaced single-nucleotide polymorphism (SNP) markers frequently yields highly correlated data, owing to extensive linkage disequilibrium (LD) between markers. The extent of LD varies widely across the genome and drives the number of frequent haplotypes observed in small regions. Several studies have illustrated the possibility that LD or haplotype data could be used to select a subset of SNPs that optimize the information retained in a genomic region while reducing the genotyping effort and simplifying the analysis. We propose a method based on the spectral decomposition of the matrices of pairwise LD between markers, and we select markers on the basis of their contributions to the total genetic variation. We also modify Clayton's "haplotype tagging SNP" selection method, which utilizes haplotype information. For both methods, we propose sliding window-based algorithms that allow the methods to be applied to large chromosomal regions. Our procedures require genotype information about a small number of individuals for an initial set of SNPs and selection of an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the genetic variation in samples. We identify suitable parameter combinations for the procedures, and we show that a sample size of 50-100 individuals achieves consistent results in studies of simulated data sets in linkage equilibrium and LD. When applied to experimental data sets, both procedures were similarly effective at reducing the genotyping requirement while maintaining the genetic information content throughout the regions. We also show that haplotype-association results that Hosking et al. obtained near CYP2D6 were almost identical before and after marker selection. 相似文献

6.

Efficient selection of tagging single-nucleotide polymorphisms in multiple populations

Howie BN Carlson CS Rieder MJ Nickerson DA 《Human genetics》2006,120(1):58-68

Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users. 相似文献

7.

Genome-wide tagging SNPs with entropy-based Monte Carlo method.

Zhenqiu Liu Shili Lin Ming Tan 《Journal of computational biology》2006,13(9):1606-1614

The number of common single nucleotide polymorphisms (SNPs) in the human genome is estimated to be around 3-6 million. It is highly anticipated that the study of SNPs will help provide a means for elucidating the genetic component of complex diseases and variable drug responses. High-throughput technologies such as oligonucleotide arrays have produced enormous amount of SNP data, which creates great challenges in genome-wide disease linkage and association studies. In this paper, we present an adaptation of the cross entropy (CE) method and propose an iterative CE Monte Carlo (CEMC) algorithm for tagging SNP selection. This differs from most of SNP selection algorithms in the literature in that our method is independent of the notion of haplotype block. Thus, the method is applicable to whole genome SNP selection without prior knowledge of block boundaries. We applied this block-free algorithm to three large datasets (two simulated and one real) that are in the order of thousands of SNPs. The successful applications to these large scale datasets demonstrate that CEMC is computationally feasible for whole genome SNP selection. Furthermore, the results show that CEMC is significantly better than random selection, and it also outperformed another block-free selection algorithm for the dataset considered. 相似文献

8.

Tag SNP selection using particle swarm optimization

Li‐Yeh Chuang Cheng‐San Yang Chang‐Hsuan Ho Cheng‐Hong Yang 《Biotechnology progress》2010,26(2):580-588

Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations amongst species. With the genome‐wide SNP discovery, many genome‐wide association studies are likely to identify multiple genetic variants that are associated with complex diseases. However, genotyping all existing SNPs for a large number of samples is still challenging even though SNP arrays have been developed to facilitate the task. Therefore, it is essential to select only informative SNPs representing the original SNP distributions in the genome (tag SNP selection) for genome‐wide association studies. These SNPs are usually chosen from haplotypes and called haplotype tag SNPs (htSNPs). Accordingly, the scale and cost of genotyping are expected to be largely reduced. We introduce binary particle swarm optimization (BPSO) with local search capability to improve the prediction accuracy of STAMPA. The proposed method does not rely on block partitioning of the genomic region, and consistently identified tag SNPs with higher prediction accuracy than either STAMPA or SVM/STSA. We compared the prediction accuracy and time complexity of BPSO to STAMPA and an SVM‐based (SVM/STSA) method using publicly available data sets. For STAMPA and SVM/STSA, BPSO effective improved prediction accuracy for smaller and larger scale data sets. These results demonstrate that the BPSO method selects tag SNP with higher accuracy no matter the scale of data sets is used. © 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010 相似文献

9.

Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster

Ober U Ayroles JF Stone EA Richards S Zhu D Gibbs RA Stricker C Gianola D Schlather M Mackay TF Simianer H 《PLoS genetics》2012,8(5):e1002685

Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ～2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. 相似文献

10.

Optimizing imputation of marker data from genotyping-by-sequencing (GBS) for genomic selection in non-model species: Rubber tree (Hevea brasiliensis) as a case study

《Genomics》2021,113(2):655-668

Genotyping-by-sequencing (GBS) provides the marker density required for genomic predictions (GP). However, GBS gives a high proportion of missing SNP data which, for species without a chromosome-level genome assembly, must be imputed without knowing the SNP physical positions. Here, we compared GP accuracy with seven map-independent and two map-dependent imputation approaches, and when using all SNPs against the subset of genetically mapped SNPs. We used two rubber tree (Hevea brasiliensis) datasets with three traits. The results showed that the best imputation approaches were LinkImputeR, Beagle and FImpute. Using the genetically mapped SNPs increased GP accuracy by 4.3%. Using LinkImputeR on all the markers allowed avoiding genetic mapping, with a slight decrease in GP accuracy. LinkImputeR gave the highest level of correctly imputed genotypes and its performances were further improved by its ability to define a subset of SNPs imputed optimally. These results will contribute to the efficient implementation of genomic selection with GBS. For Hevea, GBS is promising for rubber yield improvement, with GP accuracies reaching 0.52. 相似文献

11.

TAGster: efficient selection of LD tag SNPs in single or multiple populations

Xu Z Kaplan NL Taylor JA 《Bioinformatics (Oxford, England)》2007,23(23):3254-3255

Genetic association studies increasingly rely on the use of linkage disequilibrium (LD) tag SNPs to reduce genotyping costs. We developed a software package TAGster to select, evaluate and visualize LD tag SNPs both for single and multiple populations. We implement several strategies to improve the efficiency of current LD tag SNP selection algorithms: (1) we modify the tag SNP selection procedure of Carlson et al. to improve selection efficiency and further generalize it to multiple populations. (2) We propose a redundant SNP elimination step to speed up the exhaustive tag SNP search algorithm proposed by Qin et al. (3) We present an additional multiple population tag SNP selection algorithm based on the framework of Howie et al., but using our modified exhaustive search procedure. We evaluate these methods using resequenced candidate gene data from the Environmental Genome Project and show improvements in both computational and tagging efficiency. AVAILABILITY: The software Package TAGster is freely available at http://www.niehs.nih.gov/research/resources/software/tagster/ 相似文献

12.

Genomic Selection Using Low-Density Marker Panels

下载免费PDF全文

D. Habier R. L. Fernando J. C. M. Dekkers 《Genetics》2009,182(1):343-353

Genomic selection (GS) using high-density single-nucleotide polymorphisms (SNPs) is promising to improve response to selection in populations that are under artificial selection. High-density SNP genotyping of all selection candidates each generation, however, may not be cost effective. Smaller panels with SNPs that show strong associations with phenotype can be used, but this may require separate SNPs for each trait and each population. As an alternative, we propose to use a panel of evenly spaced low-density SNPs across the genome to estimate genome-assisted breeding values of selection candidates in pedigreed populations. The principle of this approach is to utilize cosegregation information from low-density SNPs to track effects of high-density SNP alleles within families. Simulations were used to analyze the loss of accuracy of estimated breeding values from using evenly spaced and selected SNP panels compared to using all high-density SNPs in a Bayesian analysis. Forward stepwise selection and a Bayesian approach were used to select SNPs. Loss of accuracy was nearly independent of the number of simulated quantitative trait loci (QTL) with evenly spaced SNPs, but increased with number of QTL for the selected SNP panels. Loss of accuracy with evenly spaced SNPs increased steadily over generations but was constant when the smaller number individuals that are selected for breeding each generation were also genotyped using the high-density SNP panel. With equal numbers of low-density SNPs, panels with SNPs selected on the basis of the Bayesian approach had the smallest loss in accuracy for a single trait, but a panel with evenly spaced SNPs at 10 cM was only slightly worse, whereas a panel with SNPs selected by forward stepwise selection was inferior. Panels with evenly spaced SNPs can, however, be used across traits and populations and their performance is independent of the number of QTL affecting the trait and of the methods used to estimate effects in the training data and are, therefore, preferred for broad applications in pedigreed populations under artificial selection. 相似文献

13.

Quantitative trait prediction based on genetic marker-array data, a simulation study

Yip WK Lange C 《Bioinformatics (Oxford, England)》2011,27(6):745-748

MOTIVATION: Using simulation studies for quantitative trait loci (QTL), we evaluate the prediction quality of regression models that include as covariates single-nucleotide polymorphism (SNP) genetic markers which did not achieve genome-wide significance in the original genome-wide association study, but were among the SNPs with the smallest P-value for the selected association test. We compare the results of such regression models to the standard approach which is to include only SNPs that achieve genome-wide significance. Using mean square prediction error as the model metric, our simulation results suggest that by using the coefficient of determination (R(2)) value as a guideline to increase or reduce the number of SNPs included in the regression model, we can achieve better prediction quality than the standard approach. However, important parameters such as trait heritability, the approximate number of QTLs, etc. have to be determined from previous studies or have to be estimated accurately. 相似文献

14.

GStream: Improving SNP and CNV Coverage on Genome-Wide Association Studies

Arnald Alonso Sara Marsal Raül Tortosa Oriol Canela-Xandri Antonio Julià 《PloS one》2013,8(7)

We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method. 相似文献

15.

Efficiency of genomic selection for tomato fruit quality

Janejira Duangjit Mathilde Causse Christopher Sauvage 《Molecular breeding : new strategies in plant improvement》2016,36(3):29

Fruit quality is polygenic; each component has variable heritability and is difficult to assess. Genomic selection, which allows the prediction of phenotypes based on the whole-genome genotype, could vastly help to improve fruit quality. The goal of this study is to evaluate the accuracy of genomic selection for several metabolomic and quality traits by cross-validation and to estimate the impact of different factors on its accuracy. We analyzed data from 45 phenotypic traits and genotypic data obtained from a previous study of genetic association on a collection of 163 tomato accessions. We tested the influence of (1) the size of training population, (2) the number and density of molecular markers and (3) individual relatedness on the accuracy of prediction. The prediction accuracy of phenotypic values was largely related to the heritability of the traits. The size of training population increased the accuracy of predictions. Using 122 accessions and 5995 single nucleotide polymorphisms (SNPs) was the optimal condition. The density of markers and their numbers also affected the accuracy of the prediction. Using 2313 SNP markers distributed 0.1 cM or more apart from each other reduced the accuracy of prediction, and no gain in prediction accuracy was found when more markers were used in the model. Additionally, the more accessions were related, the more accurate were the predictions. Finally, the structure of the population negatively affected the prediction accuracy. In conclusion, the results obtained by cross-validation illustrated the effect of several parameters on the accuracy of prediction and revealed the potential of genomic selection in tomato breeding programs. 相似文献

16.

Integrating Milk Metabolite Profile Information for the Prediction of Traditional Milk Traits Based on SNP Information for Holstein Cows

Nina Melzer D?rte Wittenburg Dirk Repsilber 《PloS one》2013,8(8)

In this study the benefit of metabolome level analysis for the prediction of genetic value of three traditional milk traits was investigated. Our proposed approach consists of three steps: First, milk metabolite profiles are used to predict three traditional milk traits of 1,305 Holstein cows. Two regression methods, both enabling variable selection, are applied to identify important milk metabolites in this step. Second, the prediction of these important milk metabolite from single nucleotide polymorphisms (SNPs) enables the detection of SNPs with significant genetic effects. Finally, these SNPs are used to predict milk traits. The observed precision of predicted genetic values was compared to the results observed for the classical genotype-phenotype prediction using all SNPs or a reduced SNP subset (reduced classical approach). To enable a comparison between SNP subsets, a special invariable evaluation design was implemented. SNPs close to or within known quantitative trait loci (QTL) were determined. This enabled us to determine if detected important SNP subsets were enriched in these regions. The results show that our approach can lead to genetic value prediction, but requires less than 1% of the total amount of (40,317) SNPs., significantly more important SNPs in known QTL regions were detected using our approach compared to the reduced classical approach. Concluding, our approach allows a deeper insight into the associations between the different levels of the genotype-phenotype map (genotype-metabolome, metabolome-phenotype, genotype-phenotype). 相似文献

17.

Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms

Kostem E Lozano JA Eskin E 《Genetics》2011,188(2):449-460

Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs. 相似文献

18.

When less can be better: How can we make genomic selection more cost-effective and accurate in barley?

Amina Abed Paulino Pérez-Rodríguez José Crossa François Belzile 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2018,131(9):1873-1890

Key message

We were able to obtain good prediction accuracy in genomic selection with ~?2000 GBS-derived SNPs. SNPs in genic regions did not improve prediction accuracy compared to SNPs in intergenic regions.

Abstract

Since genotyping can represent an important cost in genomic selection, it is important to minimize it without compromising the accuracy of predictions. The objectives of the present study were to explore how a decrease in the unit cost of genotyping impacted: (1) the number of single nucleotide polymorphism (SNP) markers; (2) the accuracy of the resulting genotypic data; (3) the extent of coverage on both physical and genetic maps; and (4) the prediction accuracy (PA) for six important traits in barley. Variations on the genotyping by sequencing protocol were used to generate 16 SNP sets ranging from ~?500 to ~?35,000 SNPs. The accuracy of SNP genotypes fluctuated between 95 and 99%. Marker distribution on the physical map was highly skewed toward the terminal regions, whereas a fairly uniform coverage of the genetic map was achieved with all but the smallest set of SNPs. We estimated the PA using three statistical models capturing (or not) the epistatic effect; the one modeling both additivity and epistasis was selected as the best model. The PA obtained with the different SNP sets was measured and found to remain stable, except with the smallest set, where a significant decrease was observed. Finally, we examined if the localization of SNP loci (genic vs. intergenic) affected the PA. No gain in PA was observed using SNPs located in genic regions. In summary, we found that there is considerable scope for decreasing the cost of genotyping in barley (to capture ~?2000 SNPs) without loss of PA.

相似文献

19.

Linkage disequilibrium among commonly genotyped SNP variants detected from bull sequence,

下载免费PDF全文

W. M. Snelling L. A. Kuehn B. N. Keel R. M. Thallman G. L. Bennett 《Animal genetics》2017,48(5):516-522

Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNPs genotyped by currently available arrays used for genomic selection. A number of variants detected from sequencing influential sires are likely to be causal, but noticeable improvements in prediction accuracy using imputed sequence variant genotypes have not been reported. Improvement in accuracy of predicted breeding values may be limited by the accuracy of imputed sequence variants. Using genotypes of SNPs on a high‐density array and non‐synonymous SNPs detected in sequence from influential sires of a multibreed population, results of this examination suggest that linkage disequilibrium between non‐synonymous and array SNPs may be insufficient for accurate imputation from the array to sequence. In contrast to 75% of array SNPs being strongly correlated to another SNP on the array, less than 25% of the non‐synonymous SNPs were strongly correlated to an array SNP. When correlations between non‐synonymous and array SNPs were strong, distances between the SNPs were greater than separation that might be expected based on linkage disequilibrium decay. Consistently near‐perfect whole‐genome linkage disequilibrium between the full array and each non‐synonymous SNP within the sequenced bulls suggests that whole‐genome approaches to infer sequence variants might be more accurate than imputation based on local haplotypes. Opportunity for strong linkage disequilibrium between sequence and array SNPs may be limited by discrepancies in allele frequency distributions, so investigating alternate genotyping approaches and panels providing greater chances of frequency‐matched SNPs strongly correlated to sequence variants is also warranted. Genotypes used for this study are available from https://www.animalgenome.org/repository/pub/ ;USDA2017.0519/. 相似文献

20.

Genome‐wide association and genomic prediction for biomass yield in a genetically diverse Miscanthus sinensis germplasm panel phenotyped at five locations in Asia and North America

Lindsay V. Clark Maria S. Dwiyanti Kossonou G. Anzoua Joe E. Brummer Bimal Kumar Ghimire Katarzyna G&#x;owacka Megan Hall Kweon Heo Xiaoli Jin Alexander E. Lipka Junhua Peng Toshihiko Yamada Ji Hye Yoo Chang Yeon Yu Hua Zhao Stephen P. Long Erik J. Sacks 《Global Change Biology Bioenergy》2019,11(8):988-1007

To improve the efficiency of breeding of Miscanthus for biomass yield, there is a need to develop genomics‐assisted selection for this long‐lived perennial crop by relating genotype to phenotype and breeding value across a broad range of environments. We present the first genome‐wide association (GWA) and genomic prediction study of Miscanthus that utilizes multilocation phenotypic data. A panel of 568 Miscanthus sinensis accessions was genotyped with 46,177 single nucleotide polymorphisms (SNPs) and evaluated at one subtropical and five temperate locations over 3 years for biomass yield and 14 yield‐component traits. GWA and genomic prediction were performed separately for different years of data in order to assess reproducibility. The analyses were also performed for individual field trial locations, as well as combined phenotypic data across groups of locations. GWA analyses identified 27 significant SNPs for yield, and a total of 504 associations across 298 unique SNPs across all traits, sites, and years. For yield, the greatest number of significant SNPs was identified by combining phenotypic data across all six locations. For some of the other yield‐component traits, greater numbers of significant SNPs were obtained from single site data, although the number of significant SNPs varied greatly from site to site. Candidate genes were identified. Accounting for population structure, genomic prediction accuracies for biomass yield ranged from 0.31 to 0.35 across five northern sites and from 0.13 to 0.18 for the subtropical location, depending on the estimation method. Genomic prediction accuracies of all traits were similar for single‐location and multilocation data, suggesting that genomic selection will be useful for breeding broadly adapted M. sinensis as well as M. sinensis optimized for specific climates. All of our data, including DNA sequences flanking each SNP, are publicly available. By facilitating genomic selection in M. sinensis and Miscanthus × giganteus, our results will accelerate the breeding of these species for biomass in diverse environments. 相似文献