首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The last decade has seen rapid improvements in high-throughput single nucleotide polymorphism (SNP) genotyping technologies that have consequently made genome-wide association studies (GWAS) possible. With tens to hundreds of thousands of SNP markers being tested simultaneously in GWAS, it is imperative to appropriately pre-process, or filter out, those SNPs that may lead to false associations. This paper explores the relationships between various SNP genotype and phenotype attributes and their effects on false associations. We show that (i) uniformly distributed ordinal data as well as binary data are more easily influenced, though not necessarily negatively, by differences in various SNP attributes compared with normally distributed data; (ii) filtering SNPs on minor allele frequency (MAF) and extent of Hardy–Weinberg equilibrium (HWE) deviation has little effect on the overall false positive rate; (iii) in some cases, filtering on MAF only serves to exclude SNPs from the analysis without reduction of the overall proportion of false associations; and (iv) HWE, MAF and heterozygosity are all dependent on minor genotype frequency, a newly proposed measure for genotype integrity.  相似文献   

2.
The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using "pooled" data and compared them with "true" frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.  相似文献   

3.
A large number of putative single nucleotide polymorphisms (SNPs) have been identified from the bovine genome-sequencing project. However, few of these have been validated and many will turn out to be sequencing artefacts or have low minor allele frequencies. In addition, there is little information available on SNPs within coding regions, which are likely to be responsible for phenotypic variation. Therefore, additional SNP discovery is necessary to identify and validate polymorphisms both in specific genes and genome-wide. Sequence-tagged sites within 286 genes were resequenced from a panel of animals representing a wide range of European cattle breeds. For 80 genes, no polymorphisms were identified, and 672 putative SNPs were identified within 206 genes. Fifteen European cattle breeds (436 individuals plus available parents) were genotyped with these putative SNPs, and 389 SNPs were confirmed to have minor allele frequencies above 10%. The genes containing SNPs were localized on chromosomes by radiation hybrid mapping and on the bovine genome sequence by Blast . Flanking microsatellite loci were identified, to facilitate the alignment of the genes containing the SNPs in relation to mapped quantitative trait loci. Of the 672 putative SNPs discovered in this work, only 11 were found among the validated SNPs and 100 were found among the approximately 2.3 million putative SNPs currently in dbSNP. The genes studied in this work could be considered as candidates for traits associated with beef production and the SNPs reported will help to assess the role of the genes in the genetic control of muscle development and meat quality. The allele frequency data presented allows the general utility of the SNPs to be assessed.  相似文献   

4.
At present, the cost of genotyping single nucleotide polymorphisms (SNPs) in large numbers of subjects poses a formidable problem for molecular genetic approaches to complex diseases. We have tested the possibility of using primer extension and denaturing high performance liquid chromatography to estimate allele frequencies of SNPs in pooled DNA samples. Our data show that this method should allow the accurate estimation of absolute allele frequencies in pooled samples of DNA and also of the difference in allele frequency between different pooled DNA samples. This technique therefore offers an efficient and cheap method for genotyping SNPs in large case-control and family-based association samples.  相似文献   

5.
Zhang W  Duan S  Dolan ME 《Bioinformation》2008,2(8):322-324
The International HapMap Project provides a resource of genotypic data on single nucleotide polymorphisms (SNPs), which can be used in various association studies to identify the genetic determinants for phenotypic variations. Prior to the association studies, the HapMap dataset should be preprocessed in order to reduce the computation time and control the multiple testing problem. The less informative SNPs including those with very low genotyping rate and SNPs with rare minor allele frequencies to some extent in one or more population are removed. Some research designs only use SNPs in a subset of HapMap cell lines. Although the HapMap website and other association software packages have provided some basic tools for optimizing these datasets, a fast and user-friendly program to generate the output for filtered genotypic data would be beneficial for association studies. Here, we present a flexible, straight-forward bioinformatics program that can be useful in preparing the HapMap genotypic data for association studies by specifying cell lines and two common filtering criteria: minor allele frequencies and genotyping rate. The software was developed for Microsoft Windows and written in C++. AVAILABILITY: The Windows executable and source code in Microsoft Visual C++ are available at Google Code (http://hapmap-filter-v1.googlecode.com/) or upon request. Their distribution is subject to GNU General Public License v3.  相似文献   

6.
Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays   总被引:1,自引:0,他引:1  
We present a genotyping method for simultaneously scoring 116,204 SNPs using oligonucleotide arrays. At call rates >99%, reproducibility is >99.97% and accuracy, as measured by inheritance in trios and concordance with the HapMap Project, is >99.7%. Average intermarker distance is 23.6 kb, and 92% of the genome is within 100 kb of a SNP marker. Average heterozygosity is 0.30, with 105,511 SNPs having minor allele frequencies >5%.  相似文献   

7.
8.
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).  相似文献   

9.
Here we report a large, extensively characterized set of single-nucleotide polymorphisms (SNPs) covering the human genome. We determined the allele frequencies of 55,018 SNPs in African Americans, Asians (Japanese-Chinese), and European Americans as part of The SNP Consortium's Allele Frequency Project. A subset of 8333 SNPs was also characterized in Koreans. Because these SNPs were ascertained in the same way, the data set is particularly useful for modeling. Our results document that much genetic variation is shared among populations. For autosomes, some 44% of these SNPs have a minor allele frequency > or =10% in each population, and the average allele frequency differences between populations with different continental origins are less than 19%. However, the several percentage point allele frequency differences among the closely related Korean, Japanese, and Chinese populations suggest caution in using mixtures of well-established populations for case-control genetic studies of complex traits. We estimate that approximately 7% of these SNPs are private SNPs with minor allele frequencies <1%. A useful set of characterized SNPs with large allele frequency differences between populations (>60%) can be used for admixture studies. High-density maps of high-quality, characterized SNPs produced by this project are freely available.  相似文献   

10.
We show that single-nucleotide polymorphisms (SNPs) of moderate to high heterozygosity (minor allele frequencies >10%) can be efficiently detected, and their allele frequencies accurately estimated, by pooling the DNA samples and applying a capillary-based SSCP analysis. In this method, alleles are separated into peaks, and their frequencies can be reliably and accurately quantified from their peak heights (SD <1.8%). We found that as many as 40% of publicly available SNPs that were analyzed by this method have widely differing allele frequency distributions among groups of different ethnicity (parents of Centre d'Etude Polymorphisme Humaine families vs. Japanese individuals). These results demonstrate the effectiveness of the present pooling method in the reevaluation of candidate SNPs that have been collected by examination of limited numbers of individuals. The method should also serve as a robust quantitative technique for studies in which a precise estimate of SNP allele frequencies is essential-for example, in linkage disequilibrium analysis.  相似文献   

11.
Xiao M  Latif SM  Kwok PY 《BioTechniques》2003,34(1):190-197
Strategies for identifying genetic risk factors in complex diseases by association studies require the comparison of allele frequencies of numerous SNPs between affected and control populations. Theoretically, hundreds of thousands of SNP markers across the genome will have to be genotyped in these studies. Genotyping SNPs one sample at a time is extremely costly and time consuming. To streamline whole genome association studies, some have proposed to screen SNPs by pooling the DNA samples initially for allele frequency determination and perform individual genotyping only when there is a significant discrepancy in allele frequencies between the affected and control populations. Here we describe a new method for determining the allele frequency of SNPs in pooled DNA samples using a two-color primer extension assay with real-time monitoring of fluorescence polarization (named kinetic FP-TDI assay). By comparing the ratio of the rate of incorporation of the two allele-specific dye-terminators, one can calculate the relative amounts of each allele in the pooled sample. The accuracy of allele frequency determination with pooled samples is within 3.3 +/- 0.8% of that determined by genotyping individual samples that make up the pool.  相似文献   

12.
Although single nucleotide polymorphisms (SNPs) are commonly used in human genetics, they have only recently been incorporated into genetic studies of non‐model organisms, including cetaceans. SNPs have several advantages over other molecular markers for studies of population genetics: they are quicker and more straightforward to score, cross‐laboratory comparisons of data are less complicated, and they can be used successfully with low‐quality DNA. We screened portions of the genome of one of the most abundant cetaceans in U.S. waters, the common bottlenose dolphin (Tursiops truncatus), and identified 153 SNPs resulting in an overall average of one SNP every 463 base pairs. Custom TaqMan® Assays were designed for 53 of these SNPs, and their performance was tested by genotyping a set of bottlenose dolphin samples, including some with low‐quality DNA. We found that in 19% of the loci examined, the minor allele frequency (MAF) estimated during initial SNP ascertainment using a DNA pool of 10 individuals differed significantly from the final MAF after genotyping over 100 individuals, suggesting caution when making inferences about MAF values based on small data sets. For two assays, we also characterized the basis for unusual clustering patterns to determine whether their data could still be utilized for further genetic studies. Overall results support the use of these SNPs for accurate analysis of both poor and good‐quality DNA. We report the first SNP markers and genotyping assays for use in population and conservation genetic studies of bottlenose dolphins.  相似文献   

13.
14.
Single nucleotide polymorphisms (SNPs) are widely used when investigators try to map complex disease genes. Although biallelic SNP markers are less informative than microsatellite markers, one can increase their information content by using haplotypes. However, assigning haplotypes (i.e., assigning phase) correctly can be problematic in the presence of SNP heterozygosity. For example, a doubly heterozygous individual, with genotype 12, 12, could have haplotypes 1-1/2-2 or 1-2/2-1 with equal probability; in the absence of additional information, there is no way to determine which haplotype is correct. Thus an algorithm that assigns haplotypes to such an individual will assign the wrong one 50% of the time. We have studied the frequency of haplotype misassignments, i.e., haplotypes that are misassigned solely because of inherent marker ambiguity (not because of errors in genotyping or calculation). We examined both SNPs and microsatellite markers. We used the computer programs GENEHUNTER and SIMWALK to assign the haplotypes. We simulated (a) families with 1-5 children, (b) haplotypes involving different numbers of marker loci (3, 5, 7 and 10 loci, all in linkage equilibrium), and (c) different allele frequencies. Misassignment rates are highest (a) in small families, (b) with many SNP loci, and (c) for loci with the greatest heterozygosity (i.e., where both alleles have frequency 0.5). For example, for triads (i.e., one-child families with both parents genotyped), misassignment rates for SNPs can reach almost 50%. Family sizes of 4-5 children are required in order to ensure a misassignment frequency of < or = 5% for ten-SNP haplotypes with allele frequencies of 0.25-0.5. For microsatellites, a family size of at least 2-3 children is necessary to keep haplotyping misassignments < or = 5%. Finally, we point out that it is misleading for a computer program to yield haplotype assignments without indicating that they may have been misassigned, and we discuss the implications of these misassignments for association and linkage analysis.  相似文献   

15.
High-density single-nucleotide polymorphism (SNP) arrays have revolutionized the ability of genome-wide association studies to detect genomic regions harboring sequence variants that affect complex traits. Extensive numbers of validated SNPs with known allele frequencies are essential to construct genotyping assays with broad utility. We describe an economical, efficient, single-step method for SNP discovery, validation and characterization that uses deep sequencing of reduced representation libraries (RRLs) from specified target populations. Using nearly 50 million sequences generated on an Illumina Genome Analyzer from DNA of 66 cattle representing three populations, we identified 62,042 putative SNPs and predicted their allele frequencies. Genotype data for these 66 individuals validated 92% of 23,357 selected genome-wide SNPs, with a genotypic and sequence allele frequency correlation of r = 0.67. This approach for simultaneous de novo discovery of high-quality SNPs and population characterization of allele frequencies may be applied to any species with at least a partially sequenced genome.  相似文献   

16.
DNA variants, such as single nucleotide polymorphisms (SNPs) and copy number variants (CNVs), are unevenly distributed across the human genome. Currently, dbSNP contains more than 6 million human SNPs, and whole-genome genotyping arrays can assay more than 4 million of them simultaneously. In our study, we first questioned whether published genome-wide association studies (GWASs) assays cover all regions well in the genome. Using dbSNP build 135 data, we identified 50 genomic regions longer than 100 Kb that do not contain any common SNPs, i.e., those with minor allele frequency (MAF)≥1%. Secondly, because conserved regions are generally of functional importance, we tested genes in those large genomic regions without common SNPs. We found 97 genes and were enriched for reproduction function. In addition, we further filtered out regions with CNVs listed in the Database of Genomic Variants (DGV), segmental duplications from Human Genome Project and common variants identified by personal genome sequencing (UCSC). No region survived after those filtering. Our analysis suggests that, while there may not be many large genomic regions free of common variants, there are still some “holes” in the current human genomic map for common SNPs. Because GWAS only focused on common SNPs, interpretation of GWAS results should take this limitation into account. Particularly, two recent GWAS of fertility may be incomplete due to the map deficit. Additional SNP discovery efforts should pay close attention to these regions.  相似文献   

17.
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.  相似文献   

18.
Single-nucleotide polymorphisms (SNPs) are considered useful polymorphic markers for genetic studies of polygenic traits. A new practical approach to high-throughput genotyping of SNPs in a large number of individuals is needed in association study and other studies on relationships between genes and diseases. We have developed an accurate and high-throughput method for determining the allele frequencies by pooling the DNA samples and applying a DNA microarray hybridization analysis. In this method, the combination of the microarray, DNA pooling, probe pair hybridization, and fluorescent ratio analysis solves the dual problems of parallel multiple sample analysis, and parallel multiplex SNP genotyping for association study. Multiple DNA samples are immobilized on a slide and a single hybridization is performed with a pool of allele-specific oligonucleotide probes. The results of this study show that hybridization of microarray from pooled DNA samples can accurately obtain estimates of absolute allele frequencies in a sample pool. This method can also be used to identify differences in allele frequencies in distinct populations. It is amenable to automation and is suitable for immediate utilization for high-throughput genotyping of SNP.  相似文献   

19.

Background

There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C).

Methodology/Principal Findings

A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates).

Conclusions/Significance

This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome.  相似文献   

20.
The dog is an attractive model for genetic studies of complex disease. With drafts of the canine genome complete, a large number of single-nucleotide polymorphisms (SNPs) that are potentially useful for gene-mapping studies and empirical estimations of canine diversity and linkage disequilibrium (LD) are now available. Unfortunately, most canine SNPs remain uncharacterized, and the amount and quality of DNA available from population-based samples are limited. We assessed how these real-world challenges influence automated SNP genotyping methods such as Illumina's GoldenGate assay. We examined 384 SNPs on canine chromosome 9 and successfully genotyped a minimum of 217 and a maximum of 275 SNPs using buccal swab samples for 181 dogs (86 beagles, 76 border collies, and 15 Australian shepherds). Call rates per SNP and sample averaged 97%, with reproducibility within and between analyses averaging 98%. The majority of these SNPs were polymorphic across all 3 breeds. We observed extensive LD, albeit less than reported for surveys using fewer dogs, consistent between breeds. Analyses of population substructure indicated that beagles are distinct from border collies and Australian shepherds. These results demonstrate the suitability of amplified canine buccal samples for high-throughput multiplex genotyping and confirm extensive LD in the dog.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号