首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.

Background

Single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) are the most common type of polymorphisms and are frequently used for molecular marker development. Such markers have become very popular for all kinds of genetic analysis, including haplotype reconstruction. Haplotypes can be reconstructed for whole chromosomes but also for specific genes, based on the SNPs present. Haplotypes in the latter context represent the different alleles of a gene. The computational approach to SNP mining is becoming increasingly popular because of the continuously increasing number of sequences deposited in databases, which allows a more accurate identification of SNPs. Several software packages have been developed for SNP mining from databases. From these, QualitySNP is the only tool that combines SNP detection with the reconstruction of alleles, which results in a lower number of false positive SNPs and also works much faster than other programs. We have build a web-based SNP discovery and allele detection tool (HaploSNPer) based on QualitySNP.

Results

HaploSNPer is a flexible web-based tool for detecting SNPs and alleles in user-specified input sequences from both diploid and polyploid species. It includes BLAST for finding homologous sequences in public EST databases, CAP3 or PHRAP for aligning them, and QualitySNP for discovering reliable allelic sequences and SNPs. All possible and reliable alleles are detected by a mathematical algorithm using potential SNP information. Reliable SNPs are then identified based on the reconstructed alleles and on sequence redundancy.

Conclusion

Thorough testing of HaploSNPer (and the underlying QualitySNP algorithm) has shown that EST information alone is sufficient for the identification of alleles and that reliable SNPs can be found efficiently. Furthermore, HaploSNPer supplies a user friendly interface for visualization of SNP and alleles. HaploSNPer is available from http://www.bioinformatics.nl/tools/haplosnper/.  相似文献   

2.
Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation.  相似文献   

3.
4.
Here we report a large, extensively characterized set of single-nucleotide polymorphisms (SNPs) covering the human genome. We determined the allele frequencies of 55,018 SNPs in African Americans, Asians (Japanese-Chinese), and European Americans as part of The SNP Consortium's Allele Frequency Project. A subset of 8333 SNPs was also characterized in Koreans. Because these SNPs were ascertained in the same way, the data set is particularly useful for modeling. Our results document that much genetic variation is shared among populations. For autosomes, some 44% of these SNPs have a minor allele frequency > or =10% in each population, and the average allele frequency differences between populations with different continental origins are less than 19%. However, the several percentage point allele frequency differences among the closely related Korean, Japanese, and Chinese populations suggest caution in using mixtures of well-established populations for case-control genetic studies of complex traits. We estimate that approximately 7% of these SNPs are private SNPs with minor allele frequencies <1%. A useful set of characterized SNPs with large allele frequency differences between populations (>60%) can be used for admixture studies. High-density maps of high-quality, characterized SNPs produced by this project are freely available.  相似文献   

5.
Positional cloning of genes underlying complex diseases, such as type 2 diabetes mellitus (T2DM), typically follows a two-tiered process in which a chromosomal region is first identified by genome-wide linkage scanning, followed by association analyses using densely spaced single nucleotide polymorphic markers to identify the causal variant(s). The success of genome-wide single nucleotide polymorphism (SNP) detection has resulted in a vast number of potential markers available for use in the construction of such dense SNP maps. However, the cost of genotyping large numbers of SNPs in appropriately sized samples is nearly prohibitive. We have explored pooled DNA genotyping as a means of identifying differences in allele frequency between pools of individuals with T2DM and unaffected controls by using Pyrosequencing technology. We found that allele frequencies in pooled DNA were strongly correlated with those in individuals (r=0.99, P<0.0001) across a wide range of allele frequencies (0.02-0.50). We further investigated the sensitivity of this method to detect allele frequency differences between contrived pools, also over a wide range of allele frequencies. We found that Pyrosequencing was able to detect an allele frequency difference of less than 2% between pools, indicating that this method may be sensitive enough for use in association studies involving complex diseases where a small difference in allele frequency between cases and controls is expected.  相似文献   

6.
SNPper: retrieval and analysis of human SNPs   总被引:4,自引:0,他引:4  
MOTIVATION: Single Nucleotide Polymorphisms (SNPs) are an increasingly important tool for the study of the human genome. SNPs can be used as markers to create high-density genetic maps, as causal candidates for diseases, or to reconstruct the history of our genome. SNP-based studies rely on the availability of large numbers of validated, high-frequency SNPs whose position on the chromosomes is known with precision. Although large collections of SNPs exist in public databases, researchers need tools to effectively retrieve and manipulate them. RESULTS: We describe the implementation and usage of SNPper, a web-based application to automate the tasks of extracting SNPs from public databases, analyzing them and exporting them in formats suitable for subsequent use. Our application is oriented toward the needs of candidate-gene, whole-genome and fine-mapping studies, and provides several flexible ways to present and export the data. The application has been publicly available for over a year, and has received positive user feedback and high usage levels.  相似文献   

7.
SNPselector: a web tool for selecting SNPs for genetic association studies   总被引:7,自引:0,他引:7  
SUMMARY: Single nucleotide polymorphisms (SNPs) are commonly used for association studies to find genes responsible for complex genetic diseases. With the recent advance of SNP technology, researchers are able to assay thousands of SNPs in a single experiment. But the process of manually choosing thousands of genotyping SNPs for tens or hundreds of genes is time consuming. We have developed a web-based program, SNPselector, to automate the process. SNPselector takes a list of gene names or a list of genomic regions as input and searches the Ensembl genes or genomic regions for available SNPs. It prioritizes these SNPs on their tagging for linkage disequilibrium, SNP allele frequencies and source, function, regulatory potential and repeat status. SNPselector outputs result in compressed Excel spreadsheet files for review by the user. AVAILABILITY: SNPselector is freely available at http://primer.duhs.duke.edu/  相似文献   

8.

Background

Public SNP databases are frequently used to choose SNPs for candidate genes in the association and linkage studies of complex disorders. However, their utility for such studies of diseases with ethnic-dependent background has never been evaluated.

Results

To estimate the accuracy and completeness of SNP public databases, we analyzed the allele frequencies of 41 SNPs in 10 candidate genes for obesity and/or osteoporosis in a large American-Caucasian sample (1,873 individuals from 405 nuclear families) by PCR-invader assay. We compared our results with those from the databases and other published studies. Of the 41 SNPs, 8 were monomorphic in our sample. Twelve were reported for the first time for Caucasians and the other 29 SNPs in our sample essentially confirmed the respective allele frequencies for Caucasians in the databases and previous studies. The comparison of our data with other ethnic groups showed significant differentiation between the three major world ethnic groups at some SNPs (Caucasians and Africans differed at 3 of the 18 shared SNPs, and Caucasians and Asians differed at 13 of the 22 shared SNPs). This genetic differentiation may have an important implication for studying the well-known ethnic differences in the prevalence of obesity and osteoporosis, and complex disorders in general.

Conclusion

A comparative analysis of the SNP data of the candidate genes obtained in the present study, as well as those retrieved from the public domain, suggests that the databases may currently have serious limitations for studying complex disorders with an ethnic-dependent background due to the incomplete and uneven representation of the candidate SNPs in the databases for the major ethnic groups. This conclusion attests to the imperative necessity of large-scale and accurate characterization of these SNPs in different ethnic groups.  相似文献   

9.
Single nucleotide polymorphisms (SNPs) have become an important type of marker for commercial diagnostic and parentage genotyping applications as automated genotyping systems have been developed that yield accurate genotypes. Unfortunately, allele frequencies for public SNP markers in commercial pig populations have not been available. To fulfil this need, SNP markers previously mapped in the USMARC swine reference population were tested in a panel of 155 boars that were representative of US purebred Duroc, Hampshire, Landrace and Yorkshire populations. Multiplex assay groups of 5-7 SNP assays/group were designed and genotypes were determined using Sequenom's massarray system. Of 80 SNPs that were evaluated, 60 SNPs with minor allele frequencies >0.15 were selected for the final panel of markers. Overall identity power across breeds was 4.6 x 10(-23), but within-breed values ranged from 4.3 x 10(-14) (Hampshire) to 2.6 x 10(-22) (Yorkshire). Parentage exclusion probability with only one sampled parent was 0.9974 (all data) and ranged from 0.9594 (Hampshire) to 0.9963 (Yorkshire) within breeds. Sire exclusion probability when the dam's genotype was known was 0.99998 (all data) and ranged from 0.99868 (Hampshire) to 0.99997 (Yorkshire) within breeds. Power of exclusion was compared between the 60 SNP and 10 microsatellite markers. The parental exclusion probabilities for SNP and microsatellite marker panels were similar, but the SNP panel was much more sensitive for individual identification. This panel of SNP markers is theoretically sufficient for individual identification of any pig in the world and is publicly available.  相似文献   

10.
In this study, we describe the first set of SNP markers for the South African abalone, Haliotis midae. A cDNA library was constructed from which ESTs were selected for the screening of SNPs. The observed frequency of SNPs in this species was estimated at one every 185 bp. When characterized in wild-caught abalone, the minor allele frequencies and F(ST) estimates for every SNP indicated that these markers may potentially be useful for population analysis, parentage assignment and linkage mapping in Haliotis midae. No linkage disequilibrium was observed between SNPs originating from different EST sequences. These SNPs, together with additional SNPs currently being developed, will provide a useful complementary set of markers to the currently available genetic markers in abalone.  相似文献   

11.
A large number of maize single nucleotide polymorphism (SNP) candidate sequences have been generated and deposited in public databases. However, very little work has been done to date to comprehensively characterize those SNPs and identify a set of markers, which potentially would have high impact in molecular genetics research and breeding programs. Here we describe a multi-step process to identify highly polymorphic gene-based SNPs among ~130,000 public markers. A set of 695 highly polymorphic SNPs (minor allele frequency value >0.3), identified within exons, 5′ and 3′ untranslated regions of genes, were converted into four of the most popular high-throughput genotyping assays that include Illumina’s GoldenGate and Infinium chemistries, Life Technologies’ TaqMan assay and KBioSciences’ KASPar assay. The term “versatile” was applied to 162 gene-based SNPs that were successfully converted into all four chemistries and had perfect genotypic clustering patterns. This subset of discovered versatile SNP markers represents a universal tool for application in various molecular genetics and breeding projects in maize, where genotyping is based on one of the four above-mentioned chemistries. This study demonstrated that despite the availability of millions of discovered SNPs in maize, only a very small portion of those polymorphisms could be utilized for the development of robust, versatile assays, and has real practical value in marker-assisted selection.  相似文献   

12.
We developed an automated pipeline for the detection of single nucleotide polymorphisms (SNPs) in expressed sequence tag (EST) data sets, by combining three DNA sequence analysis programs: Phred, Phrap and PolyBayes. This application requires access to the individual electrophoregram traces. First, a reference set of 65 SNPs was obtained from the sequencing of 30 gametes in 13 maritime pine (Pinus pinaster Ait.) gene fragments (6671 bp), resulting in a frequency of 1 SNP every 102.6 bp. Second, parameters of the three programs were optimized in order to retrieve as many true SNPs, while keeping the rate of false positive as low as possible. Overall, the efficiency of detection of true SNPs was 83.1%. However, this rate varied largely as a function of the rare SNP allele frequency: down to 41% for rare SNP alleles (frequency < 10%), up to 98% for allele frequencies above 10%. Third, the detection method was applied to the 18498 assembled maritime pine (Pinus pinaster Ait.) ESTs, allowing to identify a total of 1400 candidate SNPs, in contigs containing between 4 and 20 sequence reads. These genetic resources, described for the first time in a forest tree species, were made available at http://www.pierroton.inra/genetics/Pinesnps. We also derived an analytical expression for the SNP detection probability as a function of the SNP allele frequency, the number of haploid genomes used to generate the EST sequence database, and the sample size of the contigs considered for SNP detection. The frequency of the SNP allele was shown to be the main factor influencing the probability of SNP detection.  相似文献   

13.
Because cultivated tomato (Solanum lycopersicum L.) is low in genetic diversity, public, verified single nucleotide polymorphism (SNP) markers within the species are in demand. To promote marker development we resequenced approximately 23 kb in a diverse set of 31 tomato lines including TA496. Three classes of markers were sampled: (1) 26 expressed-sequence tag (EST), all of which were predicted to be polymorphic based on TA496, (2) 14 conserved ortholog set II (COSII) or unigene, and (3) ten published sequences, composed of nine fruit quality genes and one anonymous RFLP marker. The latter two types contained mostly noncoding DNA. In total, 154 SNPs and 34 indels were observed. The distributions of nucleotide diversity estimates among marker types were not significantly different from each other. Ascertainment bias of SNPs was evaluated for the EST markers. Despite the fact that the EST markers were developed using SNP prediction within a sample consisting of only one TA496 allele and one additional allele, the majority of polymorphisms in the 26 EST markers were represented among the other 30 tomato lines. Fifteen EST markers with published SNPs were more closely examined for bias. Mean SNP diversity observations were not significantly different between the original discovery sample of two lines (53 SNPs) and the 31 line diversity panel (56 SNPs). Furthermore, TA496 shared its haplotype with at least one other line at 11 of the 15 markers. These data demonstrate that public EST databases and noncoding regions are a valuable source of unbiased SNP markers in tomato. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users. The use of trade, firm, or corporation names in this publication is for the information and convenience of the reader. Such use does not constitute an official endorsement or approval by the United States Department of Agriculture or the Agricultural Research Service of any product or service to the exclusion of others that may be suitable.  相似文献   

14.
The single nucleotide polymorphism (SNP) is the difference of the DNA sequence between individuals and provides abundant information about genetic variation. Large scale discovery of high frequency SNPs is being undertaken using various methods. However, the publicly available SNP data sometimes need to be verified. If only a particular gene locus is concerned, locus-specific polymerase chain reaction amplification may be useful. Problem of this method is that the secondary peak has to be measured. We have analyzed trace data from conventional sequencing equipment and found an applicable rule to discern SNPs from noise. The rule is applied to multiply aligned sequences with a trace and the peak height of the traces are compared between samples. We have developed software that integrates this function to automatically identify SNPs. The software works accurately for high quality sequences and also can detect SNPs in low quality sequences. Further, it can determine allele frequency, display this information as a bar graph and assign corresponding nucleotide combinations. It is also designed for a person to verify and edit sequences easily on the screen. It is very useful for identifying de novo SNPs in a DNA fragment of interest.  相似文献   

15.
Biallelic marker, most commonly single nucleotide polymorphism (SNP), is widely utilized in genetic association analysis, which can be speeded up by estimating allele frequency in pooled DNA instead of individual genotyping. Several methods have shown high accuracy and precision for allele frequency estimation in pools. Here, we explored PCR restriction fragment length polymorphism (PCR–RFLP) combined with microchip electrophoresis as a possible strategy for allele frequency estimation in DNA pools. We have used the commercial available Agilent 2100 microchip electrophoresis analysis system for quantifying the enzymatically digested DNA fragments and the fluorescence intensities to estimate the allele frequencies in the DNA pools. In this study, we have estimated the allele frequencies of five SNPs in a DNA pool composed of 141 previously genotyped health controls and a DNA pool composed of 96 previously genotyped gastric cancer patients with a frequency representation of 10–90% for the variant allele. Our studies show that accurate, quantitative data on allele frequencies, suitable for investigating the association of SNPs with complex disorders, can be estimated from pooled DNA samples by using this assay. This approach, being independent of the number of samples, promises to drastically reduce the labor and cost of genotyping in the initial association analysis.  相似文献   

16.
We developed a modified allele-specific PCR procedure for assaying single nucleotide polymorphisms (SNPs) and used the procedure (called SNAP for single-nucleotide amplified polymorphisms) to generate 62 Arabidopsis mapping markers. SNAP primers contain a single base pair mismatch within three nucleotides from the 3' end of one allele (the specific allele) and in addition have a 3' mismatch with the nonspecific allele. A computer program called SNAPER was used to facilitate the design of primers that generate at least a 1,000-fold difference in the quantity of the amplification products from the specific and nonspecific SNP alleles. Because SNAP markers can be readily assayed by electrophoresis on standard agarose gels and because a public database of over 25,000 SNPs is available between the Arabidopsis Columbia and Landsberg erecta ecotypes, the SNAP method greatly facilitates the map-based cloning of Arabidopsis genes defined by a mutant phenotype.  相似文献   

17.
Single-nucleotide polymorphisms (SNPs) are rapidly replacing microsatellites as the markers of choice for genetic linkage studies and many other studies of human pedigrees. Here, we describe an efficient approach for modeling linkage disequilibrium (LD) between markers during multipoint analysis of human pedigrees. Using a gene-counting algorithm suitable for pedigree data, our approach enables rapid estimation of allele and haplotype frequencies within clusters of tightly linked markers. In addition, with the use of a hidden Markov model, our approach allows for multipoint pedigree analysis with large numbers of SNP markers organized into clusters of markers in LD. Simulation results show that our approach resolves previously described biases in multipoint linkage analysis with SNPs that are in LD. An updated version of the freely available Merlin software package uses the approach described here to perform many common pedigree analyses, including haplotyping and haplotype frequency estimation, parametric and nonparametric multipoint linkage analysis of discrete traits, variance-components and regression-based analysis of quantitative traits, calculation of identity-by-descent or kinship coefficients, and case selection for follow-up association studies. To illustrate the possibilities, we examine a data set that provides evidence of linkage of psoriasis to chromosome 17.  相似文献   

18.
Single-nucleotide polymorphisms (SNPs) are considered useful polymorphic markers for genetic studies of polygenic traits. A new practical approach to high-throughput genotyping of SNPs in a large number of individuals is needed in association study and other studies on relationships between genes and diseases. We have developed an accurate and high-throughput method for determining the allele frequencies by pooling the DNA samples and applying a DNA microarray hybridization analysis. In this method, the combination of the microarray, DNA pooling, probe pair hybridization, and fluorescent ratio analysis solves the dual problems of parallel multiple sample analysis, and parallel multiplex SNP genotyping for association study. Multiple DNA samples are immobilized on a slide and a single hybridization is performed with a pool of allele-specific oligonucleotide probes. The results of this study show that hybridization of microarray from pooled DNA samples can accurately obtain estimates of absolute allele frequencies in a sample pool. This method can also be used to identify differences in allele frequencies in distinct populations. It is amenable to automation and is suitable for immediate utilization for high-throughput genotyping of SNP.  相似文献   

19.
Single nucleotide polymorphisms (SNPs), or biallelic markers, are popular in genetic linkage studies due to their abundance in the genome, stability, and ease of scoring. We determined the 'information ratio' (IR) of closely spaced SNPs in simulated nuclear families and affected sib pairs (ASPs). (The IR is the ratio of actual average maximum lod score to the maximum lod score attainable if the marker were fully informative.) The nuclear families included parental information, whereas the ASPs did not. We analyzed these SNPs in two ways: (1) using multipoint analysis, and (2) treating the SNPs as 'composite markers' (i.e., haplotypes, as assigned by GENEHUNTER). (3) We also calculated the IR of a single microsatellite marker with multiple alleles and compared with the IR from the SNPs. For each set of input conditions, we simulated 1000 nuclear families, of 2, 3, 4, or 5 children each, as well as 1000 ASPs. We generated SNP marker data for strings of k = 1, 2, 3, 5, 7, and 10 SNP loci, with no recombination (theta = 0) and no linkage disequilibrium among the SNPs. The MAF (minor allele frequency) was either 0.5 or 0.25, and allele frequencies were the same for all k loci in any analysis. We also generated marker data for one single-locus microsatellite marker, with m = 3, 4, 5, 6, 7, and 9 equally frequent alleles. In all simulations, the disease was fully penetrant dominant, and there was no recombination or linkage disequilibrium among markers or between marker and disease. When multipoint analysis was used, we found that 5-7 closely spaced SNPs were usually enough to yield an IR of approximately 100%, for nuclear families of any size. However, for the ASPs, even 7-10 SNPs yielded an IR of only 70-80%. A microsatellite with 9 equally frequent alleles yielded about the same IR (86-88%) as a string of 4-5 SNPs, in nuclear families. SNPs analyzed as 'composite markers' analyses performed worse, due to the inherent ambiguity of SNP haplotyping.  相似文献   

20.
The SNP Consortium website (http://snp.cshl.org) has undergone many changes since its initial conception three years ago. The database back end has been changed from the venerable ACeDB to the more scalable MySQL engine. Users can access the data via gene or single nucleotide polymorphism (SNP) keyword searches and browse or dump SNP data to textfiles. A graphical genome browsing interface shows SNPs mapped onto the genome assembly in the context of externally available gene predictions and other features. SNP allele frequency and genotype data are available via FTP-download and on individual SNP report web pages. SNP linkage maps are available for download and for browsing in a comparative map viewer. All software components of the data coordinating center (DCC) website (http://snp.cshl.org) are open source.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号