首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or (2) necessary reduction of the huge SNP sets (obtained, e.g. from Affymetrix) for further fine haplotype analysis. A novel informative SNP selection method for unphased genotype data based on multiple linear regression (MLR) is implemented in the software package MLR-tagging. This software can be used for informative SNP (tag) selection and genotype prediction. The stepwise tag selection algorithm (STSA) selects positions of the given number of informative SNPs based on a genotype sample population. The MLR SNP prediction algorithm predicts a complete genotype based on the values of its informative SNPs, their positions among all SNPs, and a sample of complete genotypes. An extensive experimental study on various datasets including 10 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. (2005). AVAILABILITY: MLR-Tagging software package is publicly available at http://alla.cs.gsu.edu/~software/tagging/tagging.html  相似文献   

2.

Background  

Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets.  相似文献   

3.
In a de novo genotyping‐by‐sequencing (GBS) analysis of short, 64‐base tag‐level haplotypes in 4657 accessions of cultivated oat, we discovered 164741 tag‐level (TL) genetic variants containing 241224 SNPs. From this, the marker density of an oat consensus map was increased by the addition of more than 70000 loci. The mapped TL genotypes of a 635‐line diversity panel were used to infer chromosome‐level (CL) haplotype maps. These maps revealed differences in the number and size of haplotype blocks, as well as differences in haplotype diversity between chromosomes and subsets of the diversity panel. We then explored potential benefits of SNP vs. TL vs. CL GBS variants for mapping, high‐resolution genome analysis and genomic selection in oats. A combined genome‐wide association study (GWAS) of heading date from multiple locations using both TL haplotypes and individual SNP markers identified 184 significant associations. A comparative GWAS using TL haplotypes, CL haplotype blocks and their combinations demonstrated the superiority of using TL haplotype markers. Using a principal component‐based genome‐wide scan, genomic regions containing signatures of selection were identified. These regions may contain genes that are responsible for the local adaptation of oats to Northern American conditions. Genomic selection for heading date using TL haplotypes or SNP markers gave comparable and promising prediction accuracies of up to r = 0.74. Genomic selection carried out in an independent calibration and test population for heading date gave promising prediction accuracies that ranged between r = 0.42 and 0.67. In conclusion, TL haplotype GBS‐derived markers facilitate genome analysis and genomic selection in oat.  相似文献   

4.
Multiallelic short tandem repeat polymorphisms, or microsatellites, are useful markers in genome wide scans to identify chromosomal regions containing genes underlying disease loci. The biallelic single nucleotide polymorphism (SNP) can be used to fine map previously identified large candidate regions or to test functional candidate genes by association analysis. In the GenomEUtwin project the population based impact of susceptibility genes for six multifactorial traits will be studied. A genome wide panel of informative human microsatellite markers will be analyzed by fluorescent capillary electrophoresis in well characterized twin and population samples. Contrary to microsatellites, selection of the most informative panels of SNPs is hampered by imperfect data on the allele frequencies and population distribution of SNPs markers in the databases. Therefore, selection of SNPs requires a substantial amount of bioinformatics, and, the SNPs need to be validated experimentally in the relevant populations prior to genotyping large sample sets. In the GenomEUtwin project, large scale genotyping of SNPs will be performed using the SNPstreamUHT and MassARRAY genotyping systems that are based on the primer extension reaction principle combined with fluorescent and mass spectrometric detection, respectively. Production of the genotyping data will be a joint effort by GenomEUtwin partners at the University of Helsinki, the National Public Health Institute in Helsinki, Finland and Uppsala University, Sweden. All genotyping data will be stored in a common database established specifically for the GenomEUtwin project, from where it can be accessed by the twin research centres that provided the samples for genotyping.  相似文献   

5.
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low‐density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome‐wide association analysis and genomic prediction in Labrador Retrievers.  相似文献   

6.
SNP arrays are widely used in genetic research and agricultural genomics applications, and the quality of SNP genotyping data is of paramount importance. In the present study, SNP genotyping concordance and discordance were evaluated for commercial bovine SNP arrays based on two types of quality assurance (QA) samples provided by Neogen GeneSeek. The genotyping discordance rates (GDRs) between chips were on average between 0.06% and 0.37% based on the QA type I data and between 0.05% and 0.15% based on the QA type II data. The average genotyping error rate (GER) pertaining to single SNP chips, based on the QA type II data, varied between 0.02% and 0.08% per SNP and between 0.01% and 0.06% per sample. These results indicate that genotyping concordance rate was high (i.e. from 99.63% to 99.99%). Nevertheless, mitochondrial and Y chromosome SNPs had considerably elevated GDRs and GERs compared to the SNPs on the 29 autosomes and X chromosome. The majority of genotyping errors resulted from single allotyping errors, which also included the opposite instances for allele ‘dropout’ (i.e. from AB to AA or BB). Simultaneous allotyping errors on both alleles (e.g. mistaking AA for BB or vice versa) were relatively rare. Finally, a list of SNPs with a GER greater than 1% is provided. Interpretation of association effects of these SNPs, for example in genome‐wide association studies, needs to be taken with caution. The genotyping concordance information needs to be considered in the optimal design of future bovine SNP arrays.  相似文献   

7.
Cultivated soybean (Glycine max) suffers from a narrow germplasm relative to other crop species, probably because of under‐use of wild soybean (Glycine soja) as a breeding resource. Use of a single nucleotide polymorphism (SNP) genotyping array is a promising method for dissecting cultivated and wild germplasms to identify important adaptive genes through high‐density genetic mapping and genome‐wide association studies. Here we describe a large soybean SNP array for use in diversity analyses, linkage mapping and genome‐wide association analyses. More than four million high‐quality SNPs identified from high‐depth genome re‐sequencing of 16 soybean accessions and low‐depth genome re‐sequencing of 31 soybean accessions were used to select 180 961 SNPs for creation of the Axiom® SoyaSNP array. Validation analysis for a set of 222 diverse soybean lines showed that 170 223 markers were of good quality for genotyping. Phylogenetic and allele frequency analyses of the validation set data indicated that accessions showing an intermediate morphology between cultivated and wild soybeans collected in Korea were natural hybrids. More than 90 unanchored scaffolds in the current soybean reference sequence were assigned to chromosomes using this array. Finally, dense average spacing and preferential distribution of the SNPs in gene‐rich chromosomal regions suggest that this array may be suitable for genome‐wide association studies of soybean germplasm. Taken together, these results suggest that use of this array may be a powerful method for soybean genetic analyses relating to many aspects of soybean breeding.  相似文献   

8.
SNP(single nucleotide polymorphism,单核苷酸多态)在猪基因组中的分布极其广泛,平均分布间隔为300~400 bp,相关数据库收录已达55万条。猪基因组测序已取得实质性进展,大规模搜索发现基因组及EST(expressed sequence tag)序列中的SNP已展开,应用于猪全基因组水平的SNP芯片已建立。在此基础上,基于猪SNP标记的遗传图谱绘制、QTL(quantitative trait loci)定位、遗传多样性检测及全基因组关联分析等也都相继出现。  相似文献   

9.
The rapid development and application of molecular marker assays have facilitated genomic selection and genome‐wide linkage and association studies in wheat breeding. Although PCR‐based markers (e.g. simple sequence repeats and functional markers) and genotyping by sequencing have contributed greatly to gene discovery and marker‐assisted selection, the release of a more accurate and complete bread wheat reference genome has resulted in the design of single‐nucleotide polymorphism (SNP) arrays based on different densities or application targets. Here, we evaluated seven types of wheat SNP arrays in terms of their SNP number, distribution, density, associated genes, heterozygosity and application. The results suggested that the Wheat 660K SNP array contained the highest percentage (99.05%) of genome‐specific SNPs with reliable physical positions. SNP density analysis indicated that the SNPs were almost evenly distributed across the whole genome. In addition, 229 266 SNPs in the Wheat 660K SNP array were located in 66 834 annotated gene or promoter intervals. The annotated genes revealed by the Wheat 660K SNP array almost covered all genes revealed by the Wheat 35K (97.44%), 55K (99.73%), 90K (86.9%) and 820K (85.3%) SNP arrays. Therefore, the Wheat 660K SNP array could act as a substitute for other 6 arrays and shows promise for a wide range of possible applications. In summary, the Wheat 660K SNP array is reliable and cost‐effective and may be the best choice for targeted genotyping and marker‐assisted selection in wheat genetic improvement.  相似文献   

10.
Pear (Pyrus; 2n = 34), the third most important temperate fruit crop, has great nutritional and economic value. Despite the availability of many genomic resources in pear, it is challenging to genotype novel germplasm resources and breeding progeny in a timely and cost‐effective manner. Genotyping arrays can provide fast, efficient and high‐throughput genetic characterization of diverse germplasm, genetic mapping and breeding populations. We present here 200K AXIOM® PyrSNP, a large‐scale single nucleotide polymorphism (SNP) genotyping array to facilitate genotyping of Pyrus species. A diverse panel of 113 re‐sequenced pear genotypes was used to discover SNPs to promote increased adoption of the array. A set of 188 diverse accessions and an F1 population of 98 individuals from ‘Cuiguan’ × ‘Starkrimson’ was genotyped with the array to assess its effectiveness. A large majority of SNPs (166 335 or 83%) are of high quality. The high density and uniform distribution of the array SNPs facilitated prediction of centromeric regions on 17 pear chromosomes, and significantly improved the genome assembly from 75.5% to 81.4% based on genetic mapping. Identification of a gene associated with flowering time and candidate genes linked to size of fruit core via genome wide association studies showed the usefulness of the array in pear genetic research. The newly developed high‐density SNP array presents an important tool for rapid and high‐throughput genotyping in pear for genetic map construction, QTL identification and genomic selection.  相似文献   

11.
High‐density SNP genotyping arrays can be designed for any species given sufficient sequence information of high quality. Two high‐density SNP arrays relying on the Infinium iSelect technology (Illumina) were designed for use in the conifer white spruce (Picea glauca). One array contained 7338 segregating SNPs representative of 2814 genes of various molecular functional classes for main uses in genetic association and population genetics studies. The other one contained 9559 segregating SNPs representative of 9543 genes for main uses in population genetics, linkage mapping of the genome and genomic prediction. The SNPs assayed were discovered from various sources of gene resequencing data. SNPs predicted from high‐quality sequences derived from genomic DNA reached a genotyping success rate of 64.7%. Nonsingleton in silico SNPs (i.e. a sequence polymorphism present in at least two reads) predicted from expressed sequenced tags obtained with the Roche 454 technology and Illumina GAII analyser resulted in a similar genotyping success rate of 71.6% when the deepest alignment was used and the most favourable SNP probe per gene was selected. A variable proportion of these SNPs was shared by other nordic and subtropical spruce species from North America and Europe. The number of shared SNPs was inversely proportional to phylogenetic divergence and standing genetic variation in the recipient species, but positively related to allele frequency in P. glauca natural populations. These validated SNP resources should open up new avenues for population genetics and comparative genetic mapping at a genomic scale in spruce species.  相似文献   

12.
Genetic variation analysis holds much promise as a basis for disease-gene association. However, due to the tremendous number of candidate single nucleotide polymorphisms (SNPs), there is a clear need to expedite genotyping by selecting and considering only a subset of all SNPs. This process is known as tagging SNP selection. Several methods for tagging SNP selection have been proposed, and have shown promising results. However, most of them rely on strong assumptions such as prior block-partitioning, bi-allelic SNPs, or a fixed number or location of tagging SNPs. We introduce BNTagger, a new method for tagging SNP selection, based on conditional independence among SNPs. Using the formalism of Bayesian networks (BNs), our system aims to select a subset of independent and highly predictive SNPs. Similar to previous prediction-based methods, we aim to maximize the prediction accuracy of tagging SNPs, but unlike them, we neither fix the number nor the location of predictive tagging SNPs, nor require SNPs to be bi-allelic. In addition, for newly-genotyped samples, BNTagger directly uses genotype data as input, while producing as output haplotype data of all SNPs. Using three public data sets, we compare the prediction performance of our method to that of three state-of-the-art tagging SNP selection methods. The results demonstrate that our method consistently improves upon previous methods in terms of prediction accuracy. Moreover, our method retains its good performance even when a very small number of tagging SNPs are used.  相似文献   

13.
Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNPs genotyped by currently available arrays used for genomic selection. A number of variants detected from sequencing influential sires are likely to be causal, but noticeable improvements in prediction accuracy using imputed sequence variant genotypes have not been reported. Improvement in accuracy of predicted breeding values may be limited by the accuracy of imputed sequence variants. Using genotypes of SNPs on a high‐density array and non‐synonymous SNPs detected in sequence from influential sires of a multibreed population, results of this examination suggest that linkage disequilibrium between non‐synonymous and array SNPs may be insufficient for accurate imputation from the array to sequence. In contrast to 75% of array SNPs being strongly correlated to another SNP on the array, less than 25% of the non‐synonymous SNPs were strongly correlated to an array SNP. When correlations between non‐synonymous and array SNPs were strong, distances between the SNPs were greater than separation that might be expected based on linkage disequilibrium decay. Consistently near‐perfect whole‐genome linkage disequilibrium between the full array and each non‐synonymous SNP within the sequenced bulls suggests that whole‐genome approaches to infer sequence variants might be more accurate than imputation based on local haplotypes. Opportunity for strong linkage disequilibrium between sequence and array SNPs may be limited by discrepancies in allele frequency distributions, so investigating alternate genotyping approaches and panels providing greater chances of frequency‐matched SNPs strongly correlated to sequence variants is also warranted. Genotypes used for this study are available from https://www.animalgenome.org/repository/pub/ ;USDA2017.0519/.  相似文献   

14.
Single nucleotide polymorphisms (SNPs) are rapidly becoming the marker of choice in population genetics due to a variety of advantages relative to other markers, including higher genomic density, data quality, reproducibility and genotyping efficiency, as well as ease of portability between laboratories. Advances in sequencing technology and methodologies to reduce genomic representation have made the isolation of SNPs feasible for nonmodel organisms. RNA‐seq is one such technique for the discovery of SNPs and development of markers for large‐scale genotyping. Here, we report the development of 192 validated SNP markers for parentage analysis in Tripterygion delaisi (the black‐faced blenny), a small rocky‐shore fish from the Mediterranean Sea. RNA‐seq data for 15 individual samples were used for SNP discovery by applying a series of selection criteria. Genotypes were then collected from 1599 individuals from the same population with the resulting loci. Differences in heterozygosity and allele frequencies were found between the two data sets. Heterozygosity was lower, on average, in the population sample, and the mean difference between the frequencies of particular alleles in the two data sets was 0.135 ± 0.100. We used bootstrap resampling of the sequence data to predict appropriate sample sizes for SNP discovery. As cDNA library production is time‐consuming and expensive, we suggest that using seven individuals for RNA sequencing reduces the probability of discarding highly informative SNP loci, due to lack of observed polymorphism, whereas use of more than 12 samples does not considerably improve prediction of true allele frequencies.  相似文献   

15.
Kostem E  Lozano JA  Eskin E 《Genetics》2011,188(2):449-460
Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.  相似文献   

16.
Although genomic selection offers the prospect of improving the rate of genetic gain in meat, wool and dairy sheep breeding programs, the key constraint is likely to be the cost of genotyping. Potentially, this constraint can be overcome by genotyping selection candidates for a low density (low cost) panel of SNPs with sparse genotype coverage, imputing a much higher density of SNP genotypes using a densely genotyped reference population. These imputed genotypes would then be used with a prediction equation to produce genomic estimated breeding values. In the future, it may also be desirable to impute very dense marker genotypes or even whole genome re‐sequence data from moderate density SNP panels. Such a strategy could lead to an accurate prediction of genomic estimated breeding values across breeds, for example. We used genotypes from 48 640 (50K) SNPs genotyped in four sheep breeds to investigate both the accuracy of imputation of the 50K SNPs from low density SNP panels, as well as prospects for imputing very dense or whole genome re‐sequence data from the 50K SNPs (by leaving out a small number of the 50K SNPs at random). Accuracy of imputation was low if the sparse panel had less than 5000 (5K) markers. Across breeds, it was clear that the accuracy of imputing from sparse marker panels to 50K was higher if the genetic diversity within a breed was lower, such that relationships among animals in that breed were higher. The accuracy of imputation from sparse genotypes to 50K genotypes was higher when the imputation was performed within breed rather than when pooling all the data, despite the fact that the pooled reference set was much larger. For Border Leicesters, Poll Dorsets and White Suffolks, 5K sparse genotypes were sufficient to impute 50K with 80% accuracy. For Merinos, the accuracy of imputing 50K from 5K was lower at 71%, despite a large number of animals with full genotypes (2215) being used as a reference. For all breeds, the relationship of individuals to the reference explained up to 64% of the variation in accuracy of imputation, demonstrating that accuracy of imputation can be increased if sires and other ancestors of the individuals to be imputed are included in the reference population. The accuracy of imputation could also be increased if pedigree information was available and was used in tracking inheritance of large chromosome segments within families. In our study, we only considered methods of imputation based on population‐wide linkage disequilibrium (largely because the pedigree for some of the populations was incomplete). Finally, in the scenarios designed to mimic imputation of high density or whole genome re‐sequence data from the 50K panel, the accuracy of imputation was much higher (86–96%). This is promising, suggesting that in silico genome re‐sequencing is possible in sheep if a suitable pool of key ancestors is sequenced for each breed.  相似文献   

17.
Farmed Atlantic salmon (Salmo salar) is a globally important production species, including in Australia where breeding and selection has been in progress since the 1960s. The recent development of SNP genotyping platforms means genome‐wide association and genomic prediction can now be implemented to speed genetic gain. As a precursor, this study collected genotypes at 218 132 SNPs in 777 fish from a Tasmanian breeding population to assess levels of genetic diversity, the strength of linkage disequilibrium (LD) and imputation accuracy. Genetic diversity in Tasmanian Atlantic salmon was lower than observed within European populations when compared using four diversity metrics. The distribution of allele frequencies also showed a clear difference, with the Tasmanian animals carrying an excess of low minor allele frequency variants. The strength of observed LD was high at short distances (<25 kb) and remained above background for marker pairs separated by large chromosomal distances (hundreds of kb), in sharp contrast to the European Atlantic salmon tested. Genotypes were used to evaluate the accuracy of imputation from low density (0.5 to 5 K) up to increased density SNP sets (78 K). This revealed high imputation accuracies (0.89–0.97), suggesting that the use of low density SNP sets will be a successful approach for genomic prediction in this population. The long‐range LD, comparatively low genetic diversity and high imputation accuracy in Tasmanian salmon is consistent with known aspects of their population history, which involved a small founding population and an absence of subsequent introgression. The findings of this study represent an important first step towards the design of methods to apply genomics in this economically important population.  相似文献   

18.
We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method.  相似文献   

19.

Key message

We were able to obtain good prediction accuracy in genomic selection with ~?2000 GBS-derived SNPs. SNPs in genic regions did not improve prediction accuracy compared to SNPs in intergenic regions.

Abstract

Since genotyping can represent an important cost in genomic selection, it is important to minimize it without compromising the accuracy of predictions. The objectives of the present study were to explore how a decrease in the unit cost of genotyping impacted: (1) the number of single nucleotide polymorphism (SNP) markers; (2) the accuracy of the resulting genotypic data; (3) the extent of coverage on both physical and genetic maps; and (4) the prediction accuracy (PA) for six important traits in barley. Variations on the genotyping by sequencing protocol were used to generate 16 SNP sets ranging from ~?500 to ~?35,000 SNPs. The accuracy of SNP genotypes fluctuated between 95 and 99%. Marker distribution on the physical map was highly skewed toward the terminal regions, whereas a fairly uniform coverage of the genetic map was achieved with all but the smallest set of SNPs. We estimated the PA using three statistical models capturing (or not) the epistatic effect; the one modeling both additivity and epistasis was selected as the best model. The PA obtained with the different SNP sets was measured and found to remain stable, except with the smallest set, where a significant decrease was observed. Finally, we examined if the localization of SNP loci (genic vs. intergenic) affected the PA. No gain in PA was observed using SNPs located in genic regions. In summary, we found that there is considerable scope for decreasing the cost of genotyping in barley (to capture ~?2000 SNPs) without loss of PA.
  相似文献   

20.
MOTIVATIONS: The tag SNP approach is a valuable tool in whole genome association studies, and a variety of algorithms have been proposed to identify the optimal tag SNP set. Currently, most tag SNP selection is based on two-marker (pairwise) linkage disequilibrium (LD). Recent literature has shown that multiple-marker LD also contains useful information that can further increase the genetic coverage of the tag SNP set. Thus, tag SNP selection methods that incorporate multiple-marker LD are expected to have advantages in terms of genetic coverage and statistical power. RESULTS: We propose a novel algorithm to select tag SNPs in an iterative procedure. In each iteration loop, the SNP that captures the most neighboring SNPs (through pair-wise and multiple-marker LD) is selected as a tag SNP. We optimize the algorithm and computer program to make our approach feasible on today's typical workstations. Benchmarked using HapMap release 21, our algorithm outperforms standard pair-wise LD approach in several aspects. (i) It improves genetic coverage (e.g. by 7.2% for 200 K tag SNPs in HapMap CEU) compared to its conventional pair-wise counterpart, when conditioning on a fixed tag SNP number. (ii) It saves genotyping costs substantially when conditioning on fixed genetic coverage (e.g. 34.1% saving in HapMap CEU at 90% coverage). (iii) Tag SNPs identified using multiple-marker LD have good portability across closely related ethnic groups and (iv) show higher statistical power in association tests than those selected using conventional methods. AVAILABILITY: A computer software suite, multiTag, has been developed based on this novel algorithm. The program is freely available by written request to the author at ke_hao@merck.com  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号