首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Single nucleotide polymorphisms (SNPs), which are the most abundant form of genetic variations in numerous organisms, have emerged as important tools for the study of complex genetic traits and deciphering of genome evolution. High-throughput genome sequencing projects worldwide provide an unprecedented opportunity for whole-genome SNP analysis in a variety of species. To facilitate SNP discovery in vertebrates, we have developed a web-based, user-friendly, and fully automated application, DigiPINS, for genome-wide identification of exonic SNPs from EST data. Currently, the database can be used to the mining of exonic SNPs in six complete genomes (Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus and Danio rerio). In addition to providing information on sequence conservation, DigiPINS allows compilation of comprehensive sets of polymorphisms within cancer candidate genes or identification of novel cancer markers, making it potentially useful for cancer association studies. The DigiPINS server is available via the internet at http://pbil.univ-lyon1.fr/gem/DigiPINS/query_DigiPINS.php.  相似文献   

2.
SUMMARY: Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations in closely related microbial species, strains or isolates. Some SNPs confer selective advantages for microbial pathogens during infection and many others are powerful genetic markers for distinguishing closely related strains or isolates that could not be distinguished otherwise. To facilitate SNP discovery in microbial genomes, we have developed a web-based application, SNPsFinder, for genome-wide identification of SNPs. SNPsFinder takes multiple genome sequences as input to identify SNPs within homologous regions. It can also take contig sequences and sequence quality scores from ongoing sequencing projects for SNP prediction. SNPsFinder will use genome sequence annotation if available and map the predicted SNP regions to known genes or regions to assist further evaluation of the predicted SNPs for their functional significance. SNPsFinder can generate PCR primers for all predicted SNP regions according to user's input parameters to facilitate experimental validation. The results from SNPsFinder analysis are accessible through the World Wide Web. AVAILABILITY: The SNPsFinder program is available at http://snpsfinder.lanl.gov/. SUPPLEMENTARY INFORMATION: The user's manual is available at http://snpsfinder.lanl.gov/UsersManual/  相似文献   

3.
4.
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.  相似文献   

5.
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep‐sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize ‘bycatch’—polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand‐bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single‐copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms.  相似文献   

6.
SNP discovery in associating genetic variation with human disease phenotypes   总被引:11,自引:0,他引:11  
Suh Y  Vijg J 《Mutation research》2005,573(1-2):41-53
With the completion of the human genome project, attention is now rapidly shifting towards the study of individual genetic variation. The most abundant source of genetic variation in the human genome is represented by single nucleotide polymorphisms (SNPs), which can account for heritable inter-individual differences in complex phenotypes. Identification of SNPs that contribute to susceptibility to common diseases will provide highly accurate diagnostic information that will facilitate early diagnosis, prevention, and treatment of human diseases. Over the past several years, the advancement of increasingly high-throughput and cost-effective methods to discover and measure SNPs has begun to open the door towards this endeavor. Genetic association studies are considered to be an effective approach towards the detection of SNPs with moderate effects, as in most common diseases with complex phenotypes. This requires careful study design, analysis and interpretation. In this review, we discuss genetic association studies and address the prospect for candidate gene association studies, comparing the strengths and weaknesses of indirect and direct study designs. Our focus is on the continuous need for SNP discovery methods and the use of currently available prescreening methods for large-scale genetic epidemiological research until more advanced sequencing methods currently under development will become available.  相似文献   

7.
水稻单核苷酸多态性及其应用现状   总被引:6,自引:0,他引:6  
刘传光  张桂权 《遗传》2006,28(6):737-744
单核苷酸多态性(single nucleotide polymorphisms, SNPs)在水稻中数量多,分布密度高,遗传稳定性高。水稻SNPs的发现方法主要有对样本DNA的PCR产物直接测序、从SSR区段检测SNPs和从基因组序列直接搜索等。目前已有多种基因分型技术运用到了水稻SNPs检测,SNPs检测的高度自动化使水稻SNPs基因分型非常方便。单核苷酸多态性在水稻遗传图谱的构建、基因克隆和功能基因组学研究、标记辅助选择育种、遗传资源分类及物种进化等方面的应用具有巨大潜力。  相似文献   

8.
Publicly available single nucleotide polymorphism (SNP) allele frequencies are an important resource for the selection of genetic markers that may be most useful for gene mapping and association studies. Data mining these allele frequencies through disparate public databases and Websites is time consuming and can result in inconsistent findings. We have developed a web-based software tool, Frequency Finder, to acquire SNP allele frequencies from multiple public data sources and return a summarized result to the user. Our software optimizes and automates the search of candidate markers, decreasing the amount of time it would take to extract pertinent data manually. We have included several methods to output the data, including on-screen and as a compressed text file. We show that Frequency Finder accurately retrieves available frequency data from the available sources. Using this tool, we detect significant differences between Asian, African and Caucasian populations in the allele frequency spectra of 246 097 SNPs. While limited to public databases that provide web-based access to allele frequencies, Frequency Finder provides a single, user-friendly interface for retrieving allele frequencies for large batches of SNPs from multiple data sources.  相似文献   

9.
As large-scale sequencing efforts turn from single genome sequencing to polymorphism discovery, single nucleotide polymorphisms (SNPs) are becoming an increasingly important class of population genetic data. But because of the ascertainment biases introduced by many methods of SNP discovery, most SNP data cannot be analyzed using classical population genetic methods. Statistical methods must instead be developed that can explicitly take into account each method of SNP discovery. Here we review some of the current methods for analyzing SNPs and derive sampling distributions for single SNPs and pairs of SNPs for some common SNP discovery schemes. We also show that the ascertainment scheme has a large effect on the estimation of linkage disequilibrium and recombination, and describe some methods of correcting for ascertainment biases when estimating recombination rates from SNP data.  相似文献   

10.
Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.  相似文献   

11.
Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.  相似文献   

12.
Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs.The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species.  相似文献   

13.
MOTIVATION: Single nucleic polymorphisms (SNPs) are one of the most abundant genetic variations in the human genome. Recently, several platforms for high-throughput SNP analysis have become available, capable of measuring thousands of SNPs across the genome. Tools for analysing and visualizing these large genetic data sets in biologically relevant manner are rare. This hinders effective use of the SNP-array data in research on complex diseases, such as cancer. RESULTS: We describe a computational framework to analyse and visualize SNP-array data, and link the results in relevant databases. Our major objective is to develop methods for identifying DNA regions that likely harbour recessive mutations. Thus, the algorithms are designed to have high sensitivity and the identified regions are ranked using a scoring algorithm. We have also developed annotation tools that automatically query gene IDs, exon counts, microarray probe IDs, etc. In our case study, we apply the methods for identifying candidate regions for recessively inherited colorectal cancer predisposition and suggest directions for wet-lab experiments. AVAILABILITY: R-package implementation is available at http://www.ltdk.helsinki.fi/sysbio/csb/downloads/CohortComparator/  相似文献   

14.
ABSTRACT: BACKGROUND: A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). RESULTS: The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar 'Chandler' were mapped to 48,661 'Chandler' bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to genotype a walnut mapping population having 'Chandler' as one of the parents. Genotyping results were used to adjust the filtering parameters of the updated AGSNP pipeline. With the adjusted filtering criteria, 69.6% of SNPs discovered with the updated pipeline were real and could be mapped on the walnut genetic map. A total of 13,439 SNPs were discovered by BES re-sequencing. BESs harboring SNPs were in 677 FPC contigs covering 98% of the physical map of the walnut genome. CONCLUSION: The updated AGSNP pipeline is a versatile SNP discovery tool for a high-throughput, genome-wide SNP discovery in both autogamous and allogamous species. With this pipeline, a large set of SNPs were identified in a single walnut cultivar.  相似文献   

15.
Oilseed rape (Brassica napus) is an allotetraploid species consisting of two genomes, derived from B. rapa (A genome) and B. oleracea (C genome). The presence of these two genomes makes single nucleotide polymorphism (SNP) marker identification and SNP analysis more challenging than in diploid species, as for a given locus usually two versions of a DNA sequence (based on the two ancestral genomes) have to be analyzed simultaneously during SNP identification and analysis. One hundred amplicons derived from expressed sequence tag (ESTs) were analyzed to identify SNPs in a panel of oilseed rape varieties and within two sister species representing the ancestral genomes. A total of 604 SNPs were identified, averaging one SNP in every 42 bp. It was possible to clearly discriminate SNPs that are polymorphic between different plant varieties from SNPs differentiating the two ancestral genomes. To validate the identified SNPs for their use in genetic analysis, we have developed Illumina GoldenGate assays for some of the identified SNPs. Through the analysis of a number of oilseed rape varieties and mapping populations with GoldenGate assays, we were able to identify a number of different segregation patterns in allotetraploid oilseed rape. The majority of the identified SNP markers can be readily used for genetic mapping, showing that amplicon sequencing and Illumina GoldenGate assays can be used to reliably identify SNP markers in tetraploid oilseed rape and to convert them into successful SNP assays that can be used for genetic analysis.  相似文献   

16.
The increase in availability of resequencing data is greatly accelerating SNP discovery and has facilitated the development of SNP genotyping assays. This, in turn, is increasing interest in annotation of individual SNPs. Currently, these data are only available through curation, or comparison to a reference genome. Many species lack a reference genome, but are still important genetic models or are significant species in agricultural production or natural ecosystems. For these species, it is possible to annotate SNPs through comparison with cDNA, or data from well‐annotated genes in public repositories. We present SNPMeta, a tool which gathers information about SNPs by comparison with sequences present in GenBank databases. SNPMeta is able to annotate SNPs from contextual sequence in SNP assay designs, and SNPs discovered through genotyping by sequencing (GBS) approaches. However, SNPs discovered through GBS occur throughout the genome, rather than only in gene space, and therefore do not annotate at high rates. SNPMeta can therefore be used to annotate SNPs in nonmodel species or species that lack a reference genome. Annotations generated by SNPMeta are highly concordant with annotations that would be obtained from a reference genome.  相似文献   

17.
Dou J  Zhao X  Fu X  Jiao W  Wang N  Zhang L  Hu X  Wang S  Bao Z 《Biology direct》2012,7(1):17-9
ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome. RESULTS: Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents. CONCLUSIONS: The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome.  相似文献   

18.
Forensically relevant SNP classes   总被引:2,自引:0,他引:2  
Budowle B  van Daal A 《BioTechniques》2008,44(5):603-8, 610
Forensic samples that contain too little template DNA or are too degraded require alternate genetic marker analyses or approaches to what is currently used for routine casework. Single nucleotide polymorphisms (SNPs) offer promise to support forensic DNA analyses because of an abundance of potential markers, amenability to automation, and potential reduction in required fragment length to only 60-80 bp. The SNP markers will serve an important role in analyzing challenging forensic samples, such as those that are very degraded, for augmenting the power of kinship analyses and family reconstructions for missing persons and unidentified human remains, as well as for providing investigative lead value in some cases without a suspect (and no genetic profile match in CODIS). The SNPs for forensic analyses can be divided into four categories: identity-testing SNPs; lineage informative SNPs; ancestry informative SNPs; and phenotype informative SNPs. In addition to discussing the applications of these different types of SNPs, this article provides some discussion on privacy issues so that society and policymakers can be more informed.  相似文献   

19.
20.
Molecular breeding approaches are of growing importance to crop improvement. However, closely related cultivars generally used for crossing material lack sufficient known DNA polymorphisms due to their genetic relatedness. Next-generation sequencing allows the identification of a massive number of DNA polymorphisms such as single nucleotide polymorphisms (SNPs) and insertions-deletions (InDels) between highly homologous genomes. Using this technology, we performed whole-genome sequencing of a landrace of japonica rice, Omachi, which is used for sake brewing and is an important source for modern cultivars. A total of 229 million reads, each comprising 75 nucleotides of the Omachi genome, was generated with 45-fold coverage and uniquely mapped to 89.7% of the Nipponbare genome, a closely related cultivar. We identified 132,462 SNPs, 16,448 insertions and 19,318 deletions between the Omachi and Nipponbare genomes. An SNP array was designed to validate 731 selected SNPs, resulting in validation rates of 95 and 88% for the Omachi and Nipponbare genomes, respectively. Among the 577 SNPs validated in both genomes, 532 are entirely new SNP markers not previously reported between related rice cultivars. We also validated InDels on a part of chromosome 2 as DNA markers and successfully genotyped five japonica rice cultivars. Our results present the methodology and extensive data on SNPs and InDels available for whole-genome genotyping and marker-assisted breeding. The polymorphism information between Omachi and Nipponbare is available at NGRC_Rice_Omachi (http://www.nodai-genome.org/oryza_sativa_en.html).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号