首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
MOTIVATION: The technology to genotype single nucleotide polymorphisms (SNPs) at extremely high densities provides for hypothesis-free genome-wide scans for common polymorphisms associated with complex disease. However, we find that some errors introduced by commonly employed genotyping algorithms may lead to inflation of false associations between markers and phenotype. RESULTS: We have developed a novel SNP genotype calling program, SNiPer-High Density (SNiPer-HD), for highly accurate genotype calling across hundreds of thousands of SNPs. The program employs an expectation-maximization (EM) algorithm with parameters based on a training sample set. The algorithm choice allows for highly accurate genotyping for most SNPs. Also, we introduce a quality control metric for each assayed SNP, such that poor-behaving SNPs can be filtered using a metric correlating to genotype class separation in the calling algorithm. SNiPer-HD is superior to the standard dynamic modeling algorithm and is complementary and non-redundant to other algorithms, such as BRLMM. Implementing multiple algorithms together may provide highly accurate genotyping calls, without inflation of false positives due to systematically miss-called SNPs. A reliable and accurate set of SNP genotypes for increasingly dense panels will eliminate some false association signals and false negative signals, allowing for rapid identification of disease susceptibility loci for complex traits. AVAILABILITY: SNiPer-HD is available at TGen's website: http://www.tgen.org/neurogenomics/data.  相似文献   

2.
Association mapping is a powerful approach for dissecting the genetic architecture of complex quantitative traits using high-density SNP markers in maize. Here, we expanded our association panel size from 368 to 513 inbred lines with 0.5 million high quality SNPs using a two-step data-imputation method which combines identity by descent (IBD) based projection and k-nearest neighbor (KNN) algorithm. Genome-wide association studies (GWAS) were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model (MLM) and a new method, the Anderson-Darling (A-D) test. Ten loci for five traits were identified using the MLM method at the Bonferroni-corrected threshold −log10 (P) >5.74 (α = 1). Many loci ranging from one to 34 loci (107 loci for plant height) were identified for 17 traits using the A-D test at the Bonferroni-corrected threshold −log10 (P) >7.05 (α = 0.05) using 556809 SNPs. Many known loci and new candidate loci were only observed by the A-D test, a few of which were also detected in independent linkage analysis. This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments. In addition, we showed that the A-D test is a useful complement for GWAS analysis of complex quantitative traits. Especially for traits with abnormal phenotype distribution, controlled by moderate effect loci or rare variations, the A-D test balances false positives and statistical power. The candidate SNPs and associated genes also provide a rich resource for maize genetics and breeding.  相似文献   

3.
We present a novel method for simultaneous genotype calling and haplotype-phase inference. Our method employs the computationally efficient BEAGLE haplotype-frequency model, which can be applied to large-scale studies with millions of markers and thousands of samples. We compare genotype calls made with our method to genotype calls made with the BIRDSEED, CHIAMO, GenCall, and ILLUMINUS genotype-calling methods, using genotype data from the Illumina 550K and Affymetrix 500K arrays. We show that our method has higher genotype-call accuracy and yields fewer uncalled genotypes than competing methods. We perform single-marker analysis of data from the Wellcome Trust Case Control Consortium bipolar disorder and type 2 diabetes studies. For bipolar disorder, the genotype calls in the original study yield 25 markers with apparent false-positive association with bipolar disorder at a p < 10−7 significance level, whereas genotype calls made with our method yield no associated markers at this significance threshold. Conversely, for markers with replicated association with type 2 diabetes, there is good concordance between genotype calls used in the original study and calls made by our method. Results from single-marker and haplotypic analysis of our method''s genotype calls for the bipolar disorder study indicate that our method is highly effective at eliminating genotyping artifacts that cause false-positive associations in genome-wide association studies. Our new genotype-calling methods are implemented in the BEAGLE and BEAGLECALL software packages.  相似文献   

4.
False-positive or false-negative results attributable to undetected genotyping errors and confounding factors present a constant challenge for genome-wide association studies (GWAS) given the low signals associated with complex phenotypes and the noise associated with high-throughput genotyping. In the context of the genetics of kidneys in diabetes (GoKinD) study, we identify a source of error in genotype calling and demonstrate that a standard battery of quality-control (QC) measures is not sufficient to detect and/or correct it. We show that, if genotyping and calling are done by plate (batch), even a few DNA samples of marginally acceptable quality can profoundly alter the allele calls for other samples on the plate. In turn, this leads to significant differential bias in estimates of allele frequency between plates and, potentially, to false-positive associations, particularly when case and control samples are not sufficiently randomized to plates. This problem may become widespread as investigators tap into existing public databases for GWAS control samples. We describe how to detect and correct this bias by utilizing additional sources of information, including raw signal-intensity data.  相似文献   

5.
A genotype calling algorithm for the Illumina BeadArray platform   总被引:2,自引:0,他引:2  
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors.  相似文献   

6.
Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data.  相似文献   

7.
Multiple sclerosis (MS) is a chronic autoimmune disease of the central nervous system that predominantly affects young adults. The genetic contributions to this multifactorial disease were underscored by a genome wide association study (GWAS) conducted by the International Multiple Sclerosis Genetic Consortium in a multinational cohort prompting the discovery of 57 non-MHC MS-associated common genetic variants. Hitherto, few of these newly reported variants have been replicated in larger independent patient cohorts. We genotyped a cohort of 1033 MS patients and 644 healthy controls with a consistent genetic background for the 57 non-MHC variants reported to be associated with MS by the first large GWAS as well as the HLA DRB1*1501 tagging SNP rs3135388. We robustly replicated three of the 57 non-MHC reported MS-associated single nucleotide polymorphisms (SNPs). In addition, our study revealed several genotype-genotype combinations with an evidently higher degree of disease association than the genotypes of the single SNPs. We further correlated well-defined clinical phenotypes, i.e. ataxia, visual impairment due to optic neuritis and paresis with single SNPs and genotype combinations, and identified several associations. The results may open new avenues for clinical implications of the MS associated genetic variants reported from large GWAS.  相似文献   

8.
Recent genome-wide association studies (GWAS) identified a list of single-nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD). Replication of GWAS findings in different population corroborated the observed association in the parent GWAS. In this study, we aimed to replicate the association of rs1870634, a GWAS identified SNP, to CAD in an Iranian population. The study population consisted of 267 subjects undergoing coronary angiography coronary angiography including 155 CAD patients and 112 non-CAD age- and gender-matched controls. The genotype determination of rs1870634 SNP performed using high-resolution melting analysis (HRM) technique. Our results revealed that the GG genotype frequency was significantly higher in CAD patients compared with controls (P?=?0.03). The results of binary logistic regression suggested that this genotype was significantly associated with CAD risk adjustment for age, BMI, sex, TC, and LDL-C lipid levels (OR of 2.78, 95% CI (1.10–7.01), P?=?0.03). Moreover, our results showed that the GG+TG genotypes were 2.52 times more likely to develop CAD (95% CI 1.05–6.03) than TT genotype carriers after adjusting for age, sex, and lipid profiles (P?=?0.037). These data showed that the GG genotype could be associated with increased risk of CAD in a sample of Iranian population.  相似文献   

9.
Current genotype-calling methods such as Robust Linear Model with Mahalanobis Distance Classifier (RLMM) and Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) provide accurate calling results for Affymetrix Single Nucleotide Polymorphisms (SNP) chips. However, these methods are computationally expensive as they employ preprocess procedures, including chip data normalization and other sophisticated statistical techniques. In the small sample case the accuracy rate may drop significantly. We develop a new genotype calling method for Affymetrix 100 k and 500 k SNP chips. A two-stage classification scheme is proposed to obtain a fast genotype calling algorithm. The first stage uses unsupervised classification to quickly discriminate genotypes with high accuracy for the majority of the SNPs. And the second stage employs a supervised classification method to incorporate allele frequency information either from the HapMap data or from a self-training scheme. Confidence score is provided for every genotype call. The overall performance is shown to be comparable to that of CRLMM as verified by the known gold standard HapMap data and is superior in small sample cases. The new algorithm is computationally simple and standalone in the sense that a self-training scheme can be used without employing any other training data. A package implementing the calling algorithm is freely available at http://www.sfs.ecnu.edu.cn/teachers/xuj_en.html.  相似文献   

10.
Our initial genome-wide association study (GWAS) demonstrated that two SNPs (ARS-BFGL-NGS-33248, UA-IFASA-9288) within the protein tyrosine kinase 2 (PTK2) gene were significantly associated with milk production traits in Chinese Holstein dairy cattle. To further validate if the statistical evidence provided in GWAS were true-positive findings, a replication study was performed herein through genotype-phenotype associations. The two tested SNPs were found to show significant associations with milk production traits, which confirmed the associations observed in the original study. Specifically, SNPs lying in the PTK2 gene were also detected by sequencing 14 unrelated sires in Chinese Holsteins and a total of thirty-three novel SNPs were identified. Thirteen out of these identified SNPs were genotyped and tested for association with milk production traits in an independent resource population. After Bonferroni correction for multiple testing, twelve SNPs were statistically significant for more than two milk production traits. Analyses of pairwise D’ measures of linkage disequilibrium (LD) between all SNPs were also explored. Two haplotype blocks were inferred and the association study at haplotype level revealed similar effects on milk production traits. In addition, the RNA expression analyses revealed that a non-synonymous coding SNP (g.4061098T>G) was involved in the regulation of gene expression. Thus the findings presented here provide strong evidence for associations of PTK2 variants with dairy production traits and may be applied in Chinese Holstein breeding program.  相似文献   

11.

Introduction

MicroRNAs (miRNAs) regulate messenger RNAs (mRNAs) and as such have been implicated in a variety of diseases, including cancer. MiRNAs regulate mRNAs through binding of the miRNA 5’ seed sequence (~7–8 nucleotides) to the mRNA 3’ UTRs; polymorphisms in these regions have the potential to alter miRNA-mRNA target associations. SNPs in miRNA genes as well as miRNA-target genes have been proposed to influence cancer risk through altered miRNA expression levels.

Methods

MiRNA-SNPs and miRNA-target gene-SNPs were identified through the literature. We used SNPs from Genome-Wide Association Study (GWAS) data that were matched to individuals with miRNA expression data generated from an Agilent platform for colon tumor and non-tumor paired tissues. These samples were used to evaluate 327 miRNA-SNP pairs for associations between SNPs and miRNA expression levels as well as for SNP associations with colon cancer.

Results

Twenty-two miRNAs expressed in non-tumor tissue were significantly different by genotype and 21 SNPs were associated with altered tumor/non-tumor differential miRNA expression across genotypes. Two miRNAs were associated with SNP genotype for both non-tumor and tumor/non-tumor differential expression. Of the 41 miRNAs significantly associated with SNPs all but seven were significantly differentially expressed in colon tumor tissue. Two of the 41 SNPs significantly associated with miRNA expression levels were associated with colon cancer risk: rs8176318 (BRCA1), ORAA 1.31 95% CI 1.01, 1.78, and rs8905 (PRKAR1A), ORGG 2.31 95% CI 1.11, 4.77.

Conclusion

Of the 327 SNPs identified in the literature as being important because of their potential regulation of miRNA expression levels, 12.5% had statistically significantly associations with miRNA expression. However, only two of these SNPs were significantly associated with colon cancer.  相似文献   

12.
Genome-wide association studies (GWAS) have successfully identified susceptibility loci from marginal association analysis of SNPs. Valuable insight into genetic variation underlying complex diseases will likely be gained by considering functionally related sets of genes simultaneously. One approach is to further develop gene set enrichment analysis methods, which are initiated in gene expression studies, to account for the distinctive features of GWAS data. These features include the large number of SNPs per gene, the modest and sparse SNP associations, and the additional information provided by linkage disequilibrium (LD) patterns within genes. We propose a “gene set ridge regression in association studies (GRASS)” algorithm. GRASS summarizes the genetic structure for each gene as eigenSNPs and uses a novel form of regularized regression technique, termed group ridge regression, to select representative eigenSNPs for each gene and assess their joint association with disease risk. Compared with existing methods, the proposed algorithm greatly reduces the high dimensionality of GWAS data while still accounting for multiple hits and/or LD in the same gene. We show by simulation that this algorithm performs well in situations in which there are a large number of predictors compared to sample size. We applied the GRASS algorithm to a genome-wide association study of colon cancer and identified nicotinate and nicotinamide metabolism and transforming growth factor beta signaling as the top two significantly enriched pathways. Elucidating the role of variation in these pathways may enhance our understanding of colon cancer etiology.  相似文献   

13.
Multiple algorithms have been developed for the purpose of calling single nucleotide polymorphisms (SNPs) from Affymetrix microarrays. We extend and validate the algorithm CRLMM, which incorporates HapMap information within an empirical Bayes framework. We find CRLMM to be more accurate than the Affymetrix default programs (BRLMM and Birdseed). Also, we tie our call confidence metric to percent accuracy. We intend that our validation datasets and methods, refered to as SNPaffycomp, serve as standard benchmarks for future SNP calling algorithms.  相似文献   

14.

Background  

Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs.  相似文献   

15.
We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method.  相似文献   

16.
The last decade has seen rapid improvements in high-throughput single nucleotide polymorphism (SNP) genotyping technologies that have consequently made genome-wide association studies (GWAS) possible. With tens to hundreds of thousands of SNP markers being tested simultaneously in GWAS, it is imperative to appropriately pre-process, or filter out, those SNPs that may lead to false associations. This paper explores the relationships between various SNP genotype and phenotype attributes and their effects on false associations. We show that (i) uniformly distributed ordinal data as well as binary data are more easily influenced, though not necessarily negatively, by differences in various SNP attributes compared with normally distributed data; (ii) filtering SNPs on minor allele frequency (MAF) and extent of Hardy–Weinberg equilibrium (HWE) deviation has little effect on the overall false positive rate; (iii) in some cases, filtering on MAF only serves to exclude SNPs from the analysis without reduction of the overall proportion of false associations; and (iv) HWE, MAF and heterozygosity are all dependent on minor genotype frequency, a newly proposed measure for genotype integrity.  相似文献   

17.
With the aim of understanding relationship between genetic and phenotypic variations in cultivated tomato, single nucleotide polymorphism (SNP) markers covering the whole genome of cultivated tomato were developed and genome-wide association studies (GWAS) were performed. The whole genomes of six tomato lines were sequenced with the ABI-5500xl SOLiD sequencer. Sequence reads covering ∼13.7× of the genome for each line were obtained, and mapped onto tomato reference genomes (SL2.40) to detect ∼1.5 million SNP candidates. Of the identified SNPs, 1.5% were considered to confer gene functions. In the subsequent Illumina GoldenGate assay for 1536 SNPs, 1293 SNPs were successfully genotyped, and 1248 showed polymorphisms among 663 tomato accessions. The whole-genome linkage disequilibrium (LD) analysis detected highly biased LD decays between euchromatic (58 kb) and heterochromatic regions (13.8 Mb). Subsequent GWAS identified SNPs that were significantly associated with agronomical traits, with SNP loci located near genes that were previously reported as candidates for these traits. This study demonstrates that attractive loci can be identified by performing GWAS with a large number of SNPs obtained from re-sequencing analysis.  相似文献   

18.
Genome-wide association studies (GWASs) assess correlation between traits and DNA sequence variation using large numbers of genetic variants such as single nucleotide polymorphisms (SNPs) distributed across the genome. A GWAS produces many trait-SNP associations with low p-values, but few are replicated in subsequent studies. We sought to determine if characteristics of the genomic loci associated with a trait could be used to identify initial associations with a higher chance of replication in a second cohort. Data from the age-related eye disease study (AREDS) of 100,000 SNPs on 395 subjects with and 198 without age-related macular degeneration (AMD) were employed. Loci highly associated with AMD were characterized based on the distribution of genotypes, level of significance, and clustering of adjacent SNPs also associated with AMD suggesting linkage disequilibrium or multiple effects. Forty nine loci were highly associated with AMD, including 3 loci (CFH, C2/BF, LOC387715/HTRA1) already known to contain important genetic risks for AMD. One additional locus (C3) reported during the course of this study was identified and replicated in an additional study group. Tag-SNPs and haplotypes for each locus were evaluated for association with AMD in additional cohorts to account for population differences between discovery and replication subjects, but no additional clearly significant associations were identified. Relying on a significant genotype tests using a log-additive model would have excluded 57% of the non-replicated and none of the replicated loci, while use of other SNP features and clustering might have missed true associations.  相似文献   

19.
Genome-wide association studies (GWAS) have identified multiple single nucleotide polymorphisms (SNPs) associated with prostate cancer risk. However, whether these associations can be consistently replicated, vary with disease aggressiveness (tumor stage and grade) and/or interact with non-genetic potential risk factors or other SNPs is unknown. We therefore genotyped 39 SNPs from regions identified by several prostate cancer GWAS in 10,501 prostate cancer cases and 10,831 controls from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We replicated 36 out of 39 SNPs (P-values ranging from 0.01 to 10−28). Two SNPs located near KLK3 associated with PSA levels showed differential association with Gleason grade (rs2735839, P = 0.0001 and rs266849, P = 0.0004; case-only test), where the alleles associated with decreasing PSA levels were inversely associated with low-grade (as defined by Gleason grade <8) tumors but positively associated with high-grade tumors. No other SNP showed differential associations according to disease stage or grade. We observed no effect modification by SNP for association with age at diagnosis, family history of prostate cancer, diabetes, BMI, height, smoking or alcohol intake. Moreover, we found no evidence of pair-wise SNP-SNP interactions. While these SNPs represent new independent risk factors for prostate cancer, we saw little evidence for effect modification by other SNPs or by the environmental factors examined.  相似文献   

20.
Genome-wide association studies (GWAS) have identified thousands of genetic variants that are associated with complex traits. However, a stringent significance threshold is required to identify robust genetic associations. Leveraging relevant auxiliary covariates has the potential to boost statistical power to exceed the significance threshold. Particularly, abundant pleiotropy and the non-random distribution of SNPs across various functional categories suggests that leveraging GWAS test statistics from related traits and/or functional genomic data may boost GWAS discovery. While type 1 error rate control has become standard in GWAS, control of the false discovery rate can be a more powerful approach. The conditional false discovery rate (cFDR) extends the standard FDR framework by conditioning on auxiliary data to call significant associations, but current implementations are restricted to auxiliary data satisfying specific parametric distributions, typically GWAS p-values for related traits. We relax these distributional assumptions, enabling an extension of the cFDR framework that supports auxiliary covariates from arbitrary continuous distributions (“Flexible cFDR”). Our method can be applied iteratively, thereby supporting multi-dimensional covariate data. Through simulations we show that Flexible cFDR increases sensitivity whilst controlling FDR after one or several iterations. We further demonstrate its practical potential through application to an asthma GWAS, leveraging various functional genomic data to find additional genetic associations for asthma, which we validate in the larger, independent, UK Biobank data resource.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号