首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

2.
Analyses of high-density SNPs in genetic studies have the potential problems of prohibitive genotyping costs and inflated false discovery rates. Current methods select subsets of representative SNPs (tagSNPs) using information either on potential biologic functionality of the SNPs or on the underlying linkage disequilibrium (LD) structure, but not both. Combining the two types of information may lead to more effective tagSNP selection. The proposed method combines both functional and LD information using a weighted factor analysis (WFA) model. The WFA was applied to the dense SNP collection from 129 genes sequenced by the SeattleSNPs Program for Genomic Application. TagSNPs selected by WFA were compared with those selected by an LD-based method. WFA allowed prioritization of SNPs that would otherwise share equivalent ranking due to underlying LD structure alone. Furthermore, WFA consistently included SNPs not selected by function or by LD alone. A literature review of a subset of genes revealed that SNPs selected by WFA were more likely represented in published reports.  相似文献   

3.
Linkage disequilibrium (LD) has received much attention recently because of its value in localizing disease-causing genes. Due to the extensive LD between neighboring loci in the human genome, it is believed that a subset of the single nucleotide polymorphisms in a region (tagSNPs) can be selected to capture most of the remaining SNP variants. In this study, we examined LD patterns and HapMap tagSNP transferability in more than 300 individuals. A South Indian sample and an African Mbuti Pygmy population sample were included to evaluate the performance of HapMap tagSNPs in geographically distinct and genetically isolated populations. Our results show that HapMap tagSNPs selected with r(2) >= 0.8 can capture more than 85% of the SNPs in populations that are from the same continental group. Combined tagSNPs from HapMap CEU and CHB+JPT serve as the best reference for the Indian sample. The HapMap YRI are a sufficient reference for tagSNP selection in the Pygmy sample. In addition to our findings, we reviewed over 25 recent studies of tagSNP transferability and propose a general guideline for selecting tagSNPs from HapMap populations.  相似文献   

4.

Background

The application of a subset of single nucleotide polymorphisms, the tagSNPs, can be useful in capturing untyped SNPs information in a genomic region. TagSNP transferability from the HapMap dataset to admixed populations is of uncertain value due population structure, admixture, drift and recombination effects. In this work an empirical dataset from a Brazilian admixed sample was evaluated against the HapMap population to measure tagSNP transferability and the relative loss of variability prediction.

Methods

The transferability study was carried out using SNPs dispersed over four genomic regions: the PTPN22, HMGCR, VDR and CETP genes. Variability coverage and the prediction accuracy for tagSNPs in the selected genomic regions of HapMap phase II were computed using a prediction accuracy algorithm. Transferability of tagSNPs and relative loss of prediction were evaluated according to the difference between the Brazilian sample and the pooled and single HapMap population estimates.

Results

Each population presented different levels of prediction per gene. On average, the Brazilian (BRA) sample displayed a lower power of prediction when compared to HapMap and the pooled sample. There was a relative loss of prediction for BRA when using single HapMap populations, but a pooled HapMap dataset generated minor loss of variability prediction and lower standard deviations, except at the VDR locus at which loss was minor using CEU tagSNPs.

Conclusion

Studies that involve tagSNP selection for an admixed population should not be generally correlated with any specific HapMap population and can be better represented with a pooled dataset in most cases.  相似文献   

5.
Exploiting the association between single nucleotide polymorphisms (SNP) can potentially reduce the costs of association mapping of common disease genes. Different methods have been proposed for defining subsets of SNPs as proxies (or tagSNPs) for other SNPs, some of which rely upon a model of haplotype blocks. Other approaches only consider the pair-wise correlation between markers as a basis for selecting tagSNPs. Yet another, recently proposed model-based method takes marker heterozygosity and genetic distance into account in order to maximize the expected utility of a marker set to map frequent, but unobserved genetic variants. We compared these tagging approaches with regard to their ability to correlate tagSNPs and bi-allelic, potentially disease-causing genetic variants. We used the CEU sample of chromosome 19 from the HapMap project for an initial comparison, and demonstrated a comparable performance of both approaches but a difference in terms of tagSNPs selected and variants captured. In any case, we conclude that a considerable loss of information appears to be inherent to any type of SNP tagging, even when dense marker sets are available for SNP selection.  相似文献   

6.
Significant efforts have been made to determine the correlation structure of common SNPs in the human genome. One method has been to identify the sets of tagSNPs that capture most of the genetic variation. Here, we evaluate the transferability of tagSNPs between populations using a population sample of Sami, the indigenous people of Scandinavia. Array-based SNP discovery in a 4.4 Mb region of 28 phased copies of chromosome 21 uncovered 5,132 segregating sites, 3,188 of which had a minimum minor allele frequency (mMAF) of 0.1. Due to the population structure and consequently high LD, the number of tagSNPs needed to capture all SNP variation in Sami is much lower than that for the HapMap populations. TagSNPs identified from the HapMap data perform only slightly better in the Sami than choosing tagSNPs at random from the same set of common SNPs. Surprisingly, tagSNPs defined from the HapMap data did not perform better than selecting the same number of SNPs at random from all SNPs discovered in Sami. Nearly half (46%) of the Sami SNPs with a mMAF of 0.1 are not present in the HapMap dataset. Among sites overlapping between Sami and HapMap populations, 18% are not tagged by the European American (CEU) HapMap tagSNPs, while 43% of the SNPs that are unique to Sami are not tagged by the CEU tagSNPs. These results point to serious limitations in the transferability of common tagSNPs to capture random sequence variation, even between closely related populations, such as CEU and Sami. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

7.
The capability of molecular markers to provide information of genetic structure is influenced by their number and the way they are chosen. This study evaluates the effects of single nucleotide polymorphism (SNP) number and selection strategy on estimates of germplasm diversity and population structure for different types of barley germplasm, namely cultivar and landrace. One hundred and sixty-nine barley landraces from Syria and Jordan and 171 European barley cultivars were genotyped with 1536 SNPs. Different subsets of 384 and 96 SNPs were selected from the 1536 set, based on their ability to detect diversity in landraces or cultivated barley in addition to corresponding randomly chosen subsets. All SNP sets except the landrace-optimised subsets underestimated the diversity present in the landrace germplasm, and all subsets of SNP gave similar estimates for cultivar germplasm. All marker subsets gave qualitatively similar estimates of the population structure in both germplasm sets, but the 96 SNP sets showed much lower data resolution values than the larger SNP sets. From these data we deduce that pre-selecting markers for their diversity in a germplasm set is very worthwhile in terms of the quality of data obtained. Second, we suggest that a properly chosen 384 SNP subset gives a good combination of power and economy for germplasm characterization, whereas the rather modest gain from using 1536 SNPs does not justify the increased cost and 96 markers give unacceptably low performance. Lastly, we propose a specific 384 SNP subset as a standard genotyping tool for middle-eastern landrace barley.  相似文献   

8.
Common genetic polymorphisms may explain a portion of the heritable risk for common diseases. Within candidate genes, the number of common polymorphisms is finite, but direct assay of all existing common polymorphism is inefficient, because genotypes at many of these sites are strongly correlated. Thus, it is not necessary to assay all common variants if the patterns of allelic association between common variants can be described. We have developed an algorithm to select the maximally informative set of common single-nucleotide polymorphisms (tagSNPs) to assay in candidate-gene association studies, such that all known common polymorphisms either are directly assayed or exceed a threshold level of association with a tagSNP. The algorithm is based on the r(2) linkage disequilibrium (LD) statistic, because r(2) is directly related to statistical power to detect disease associations with unassayed sites. We show that, at a relatively stringent r(2) threshold (r2>0.8), the LD-selected tagSNPs resolve >80% of all haplotypes across a set of 100 candidate genes, regardless of recombination, and tag specific haplotypes and clades of related haplotypes in nonrecombinant regions. Thus, if the patterns of common variation are described for a candidate gene, analysis of the tagSNP set can comprehensively interrogate for main effects from common functional variation. We demonstrate that, although common variation tends to be shared between populations, tagSNPs should be selected separately for populations with different ancestries.  相似文献   

9.
Prediction of breed composition in an admixed cattle population   总被引:1,自引:0,他引:1  
Swiss Fleckvieh was established in 1970 as a composite of Simmental (SI) and Red Holstein Friesian (RHF) cattle. Breed composition is currently reported based on pedigree information. Information on a large number of molecular markers potentially provides more accurate information. For the analysis, we used Illumina BovineSNP50 Genotyping Beadchip data for 90 pure SI, 100 pure RHF and 305 admixed bulls. The scope of the study was to compare the performance of hidden Markov models, as implemented in structure software, with methods conventionally used in genomic selection [BayesB, partial least squares regression (PLSR), least absolute shrinkage and selection operator (LASSO) variable selection)] for predicting breed composition. We checked the performance of algorithms for a set of 40 492 single nucleotide polymorphisms (SNPs), subsets of evenly distributed SNPs and subsets with different allele frequencies in the pure populations, using FST as an indicator. Key results are correlations of admixture levels estimated with the various algorithms with admixture based on pedigree information. For the full set, PLSR, BayesB and structure performed in a very similar manner (correlations of 0.97), whereas the correlation of LASSO and pedigree admixture was lower (0.93). With decreasing number of SNPs, correlations decreased substantially only for 5% or 1% of all SNPs. With SNPs chosen according to FST, results were similar to results obtained with the full set. Only when using 96 and 48 SNPs with the highest FST, correlations dropped to 0.92 and 0.90 respectively. Reducing the number of pure animals in training sets to 50, 20 and 10 each did not cause a drop in the correlation with pedigree admixture.  相似文献   

10.
Our goal was to compare methods for tagging single-nucleotide polymorphisms (tagSNPs) with respect to the power to detect disease association under differing haplotype-disease association models. We were also interested in the effect that SNP selection samples, consisting of either cases, controls, or a mixture, would have on power. We investigated five previously described algorithms for choosing tagSNPS: two that picked SNPs based on haplotype structure (Chapman-haplotypic and Stram), two that picked SNPs based on pair-wise allelic association (Chapman-allelic and Cousin), and one control method that chose equally spaced SNPs (Zhai). In two disease-associated regions from the Genetic Analysis Workshop 14 simulated data, we tested the association between tagSNP genotype and disease over the tagSNP sets chosen by each method for each sampling scheme. This was repeated for 100 replicates to estimate power. The two allelic methods chose essentially all SNPs in the region and had nearly optimal power. The two haplotypic methods chose about half as many SNPs. The haplotypic methods had poor performance compared to the allelic methods in both regions. We expected an improvement in power when the selection sample contained cases; however, there was only moderate variation in power between the sampling approaches for each method. Finally, when compared to the haplotypic methods, the reference method performed as well or worse in the region with ancestral disease haplotype structure.  相似文献   

11.
According to the general approach developed in this paper, dynamic management of genetic variability in selected populations of dairy cattle is carried out for three simultaneous purposes: procreation of young bulls to be further progeny-tested, use of service bulls already selected and approval of recently progeny-tested bulls for use. At each step, the objective is to minimize the average pairwise relationship coefficient in the future population born from programmed matings and the existing population. As a common constraint, the average estimated breeding value of the new population, for a selection goal including many important traits, is set to a desired value. For the procreation of young bulls, breeding costs are additionally constrained. Optimization is fully analytical and directly considers matings. Corresponding algorithms are presented in detail. The efficiency of these procedures was tested on the current Norman population. Comparisons between optimized and real matings, clearly showed that optimization would have saved substantial genetic variability without reducing short-term genetic gains.  相似文献   

12.
The genetic factors associated with carotid artery disease (CAAD) are not fully known. Because of its role in lipid metabolism, we hypothesized that common genetic variation in the very low density lipoprotein receptor (VLDLR) gene is associated with severe CAAD (>80% stenosis), body mass index (BMI), and lipid traits in humans. VLDLR was resequenced for variation discovery in 92 subjects, and single nucleotide polymorphisms (tagSNPs) were chosen for genotyping in a larger cohort (n = 1,027). Of the 17 tagSNPs genotyped, one tagSNP (SNP 1226; rs1454626) located in the 5' flanking region of VLDLR was associated with CAAD, BMI, and LDL-associated apolipoprotein B (apoB). We also identified receptor-ligand genetic interactions between VLDLR 1226 and APOE genotype for predicting CAAD case status. These findings may further our understanding of VLDLR function, its ligand APOE, and ultimately the pathogenesis of CAAD in the general population.  相似文献   

13.
Single nucleotide polymorphisms (SNPs) are increasingly used to tag genetic loci associated with phenotypes such as risk of complex diseases. Technically, this is done genome-wide without prior restriction or knowledge of biological feasibility in scans referred to as genome-wide association studies (GWAS). Depending on the linkage disequilibrium (LD) structure at a particular locus, such tagSNPs may be surrogates for many thousands of other SNPs, and it is difficult to distinguish those that may play a functional role in the phenotype from those simply genetically linked. Because a large proportion of tagSNPs have been identified within non-coding regions of the genome, distinguishing functional from non-functional SNPs has been an even greater challenge. A strategy was recently proposed that prioritizes surrogate SNPs based on non-coding chromatin and epigenomic mapping techniques that have become feasible with the advent of massively parallel sequencing. Here, we introduce an R/Bioconductor software package that enables the identification of candidate functional SNPs by integrating information from tagSNP locations, lists of linked SNPs from the 1000 genomes project and locations of chromatin features which may have functional significance. Availability: FunciSNP is available from Bioconductor (bioconductor.org).  相似文献   

14.
Association mapping of complex traits typically employs tagSNP genotype data to identify a trait locus within a region of interest. However, considerable debate exists regarding the most powerful strategy for utilizing such tagSNP data for inference. A popular approach tests each tagSNP within the region individually, but such tests could lose power as a result of incomplete linkage disequilibrium between the genotyped tagSNP and the trait locus. Alternatively, one can jointly test all tagSNPs simultaneously within the region (by using genotypes or haplotypes), but such multivariate tests have large degrees of freedom that can also compromise power. Here, we consider a semiparametric model for quantitative-trait mapping that uses genetic information from multiple tagSNPs simultaneously in analysis but produces a test statistic with reduced degrees of freedom compared to existing multivariate approaches. We fit this model by using a dimension-reducing technique called least-squares kernel machines, which we show is identical to analysis using a specific linear mixed model (which we can fit by using standard software packages like SAS and R). Using simulated SNP data based on real data from the International HapMap Project, we demonstrate that our approach often has superior performance for association mapping of quantitative traits compared to the popular approach of single-tagSNP testing. Our approach is also flexible, because it allows easy modeling of covariates and, if interest exists, high-dimensional interactions among tagSNPs and environmental predictors.  相似文献   

15.
Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP) pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk. To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1 (15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P = 3.93×10(-10)) and BMP2 (rs4813802, P = 4.65×10(-11)). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P = 5.33×10(-8)) and rs11632715 (P = 2.30×10(-10)). As low-penetrance predisposition variants become harder to identify-owing to small effect sizes and/or low risk allele frequencies-approaches based on informed candidate gene selection may become increasingly attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing heritability of common diseases.  相似文献   

16.
Several studies have shown that computation of genomic estimated breeding values (GEBV) with accuracies significantly greater than parent average (PA) estimated breeding values (EBVs) requires genotyping of at least several thousand progeny-tested bulls. For all published analyses, GEBV computed from the selected samples of markers have lower or equal accuracy than GEBV derived on the basis of all valid single nucleotide polymorphisms (SNPs). In the current study, we report on four new methods for selection of markers. Milk, fat, protein, somatic cell score, fertility, persistency, herd life and the Israeli selection index were analyzed. The 972 Israeli Holstein bulls genotyped with EBV for milk production traits computed from daughter records in 2012 were assigned into a training set of 844 bulls with progeny test EBV in 2008, and a validation set of 128 young bulls. Numbers of bulls in the two sets varied slightly among the nonproduction traits. In EFF12, SNPs were first selected for each trait based on the effects of each marker on the bulls’ 2012 EBV corrected for effective relationships, as determined by the SNP matrix. EFF08 was the same as EFF12, except that the SNPs were selected on the basis of the 2008 EBV. In DIFmax, the SNPs with the greatest differences in allelic frequency between the bulls in the training and validation sets were selected, whereas in DIFmin the SNPs with the smallest differences were selected. For all methods, the numbers of SNPs retained varied over the range of 300 to 6000. For each trait, except fertility, an optimum number of markers between 800 and 5000 was obtained for EFF12, based on the correlation between the GEBV and current EBV of the validation bulls. For all traits, the difference between the correlation of GEBV and current EBV and the correlation of the PA and current EBV was >0.25. EFF08 was inferior to EFF12, and was generally no better than PA EBV. DIFmax always outperformed DIFmin and generally outperformed EFF08 and PA. Furthermore, GEBV based on DIFmax were generally less biased than PA. It is likely that other methods of SNP selection could improve upon these results.  相似文献   

17.
A method that predicts the genetic composition and inbreeding (F) of the future dairy cow population using information on the current cow population, semen use and progeny test bulls is described. This is combined with information on genetic merit of bulls to compare bull selection methods that minimise F and maximise breeding value for profit (called APR in Australia). The genetic composition of the future cow population of Australian Holstein-Friesian (HF) and Jersey up to 6 years into the future was predicted. F in Australian HF and Jersey breeds is likely to increase by about 0.002 and 0.003 per year between 2002 and 2008, respectively. A comparison of bull selection methods showed that a method that selects the best bull from all available bulls for each current or future cow, based on its calf''s APR minus F depression, is better than bull selection methods based on APR alone, APR adjusted for mean F of prospective progeny after random mating and mean APR adjusted for the relationship between the selected bulls. This method reduced F of prospective progeny by about a third to a half compared to the other methods when bulls are mated to current and future cows that will be available 5 to 6 years from now. The method also reduced the relationship between the bulls selected to nearly the same extent as the method that is aimed at maximising genetic gain adjusted for the relationship between bulls. The method achieves this because cows with different pedigree exist in the population and the method selects relatively unrelated bulls to mate to these different cows. Selecting the best bull for each current or future cow so that the calf''s genetic merit minus F depression is maximised can slow the rate of increase in F in the population.  相似文献   

18.
A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNPs). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as "tagging" SNPs, able to capture most variation in a population. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper, we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally redundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based methods. Supplementary website: http://htsnp.stanford.edu/FSFS/.  相似文献   

19.
Estimated breeding values for average daily feed intake (AFI; kg/day), residual feed intake (RFI; kg/day) and average daily gain (ADG; kg/day) were generated using a mixed linear model incorporating genomic relationships for 698 Angus steers genotyped with the Illumina BovineSNP50 assay. Association analyses of estimated breeding values (EBVs) were performed for 41,028 single nucleotide polymorphisms (SNPs), and permutation analysis was used to empirically establish the genome-wide significance threshold (P < 0.05) for each trait. SNPs significantly associated with each trait were used in a forward selection algorithm to identify genomic regions putatively harbouring genes with effects on each trait. A total of 53, 66 and 68 SNPs explained 54.12% (24.10%), 62.69% (29.85%) and 55.13% (26.54%) of the additive genetic variation (when accounting for the genomic relationships) in steer breeding values for AFI, RFI and ADG, respectively, within this population. Evaluation by pathway analysis revealed that many of these SNPs are in genomic regions that harbour genes with metabolic functions. The presence of genetic correlations between traits resulted in 13.2% of SNPs selected for AFI and 4.5% of SNPs selected for RFI also being selected for ADG in the analysis of breeding values. While our study identifies panels of SNPs significant for efficiency traits in our population, validation of all SNPs in independent populations will be necessary before commercialization.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号