首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Single Nucleotide Polymorphisms (SNPs) are believed to contribute strongly to the genetic variability in living beings, and SNP and mutation discovery are of great interest in today's Life Sciences. A comparatively new method to discover such polymorphisms is based on base-specific cleavage, where resulting cleavage products are analyzed by mass spectrometry (MS). One particular advantage of this method is the possibility of multiplexing the biochemical reactions, i.e. examining multiple genomic regions in parallel. Simulations can help estimating the performance of a method for polymorphism discovery, and allow us to evaluate the influence of method parameters on the discovery rate, and also to investigate whether the method is well suited for a certain genomic region. RESULTS: We show how to efficiently conduct such simulations for polymorphism discovery using base-specific cleavage and MS. Simulating multiplexed polymorphism discovery leads us to the problem of uniformly drawing a multiplex. Given a multiset of natural numbers we want to uniformly draw a subset of fixed cardinality so that the elements sum up to some fixed total length. We show how to enumerate multiplex layouts using dynamic programming, which allows us to uniformly draw a multiplex.  相似文献   

2.
One of the main endeavors in today's life science remains the efficient sequencing of long DNA molecules. Today, most de novo sequencing of DNA is still performed using the electrophoresis-based Sanger concept of 1977, in spite of certain restrictions of this method. Methods using mass spectrometry to acquire the Sanger sequencing data are limited by short sequencing lengths of 15-25 nt. We propose a new method for DNA sequencing using base-specific cleavage and mass spectrometry that appears to be a promising alternative to classical DNA sequencing approaches. A single stranded DNA or RNA molecule is cleaved by a base-specific (bio-)chemical reaction using, for example, RNAses. The cleavage reaction is modified such that not all, but only a certain percentage of bases are cleaved. The resulting mixture of fragments is then analyzed using MALDI-TOF mass spectrometry, whereby we acquire the molecular masses of fragments. For every peak in the mass spectrum, we calculate those base compositions that will potentially create a peak of the observed mass and, repeating the cleavage reaction for all four bases, finally try to uniquely reconstruct the underlying sequence from these observed spectra. This leads us to the combinatorial problem of sequencing from compomers and, finally, to the graph-theoretical problem of finding a walk in a subgraph of the de Bruijn graph. Application of this method to simulated data indicates that it might be capable of sequencing DNA molecules with 200+ nt.  相似文献   

3.
Salmonid genomes are considered to be in a pseudo‐tetraploid state as a result of a genome duplication event that occurred between 25 and 100 Ma. This situation complicates single‐nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and not simple allelic variants. To differentiate PSVs from simple allelic variants, we used 19 homozygous doubled haploid (DH) lines that represent a wide geographical range of rainbow trout populations. In the first phase of the study, we analysed SbfI restriction‐site associated DNA (RAD) sequence data from all the 19 lines and selected 11 lines for an extended SNP discovery. In the second phase, we conducted the extended SNP discovery using PstI RAD sequence data from the selected 11 lines. The complete data set is composed of 145 168 high‐quality putative SNPs that were genotyped in at least nine of the 11 lines, of which 71 446 (49%) had minor allele frequencies (MAF) of at least 18% (i.e. at least two of the 11 lines). Approximately 14% of the RAD SNPs in this data set are from expressed or coding rainbow trout sequences. Our comparison of the current data set with previous SNP discovery data sets revealed that 99% of our SNPs are novel. In the support files for this resource, we provide annotation to the positions of the SNPs in the working draft of the rainbow trout reference genome, provide the genotypes of each sample in the discovery panel and identify SNPs that are likely to be in coding sequences.  相似文献   

4.
Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n?=?222 samples) and lettuce (n?=?87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike.  相似文献   

5.
? Premise of the study: Next-generation sequencing (NGS) technologies are frequently used for resequencing and mining of single nucleotide polymorphisms (SNPs) by comparison to a reference genome. In crop species such as chickpea (Cicer arietinum) that lack a reference genome sequence, NGS-based SNP discovery is a challenge. Therefore, unlike probability-based statistical approaches for consensus calling and by comparison with a reference sequence, a coverage-based consensus calling (CbCC) approach was applied and two genotypes were compared for SNP identification. ? Methods: A CbCC approach is used in this study with four commonly used short read alignment tools (Maq, Bowtie, Novoalign, and SOAP2) and 15.7 and 22.1 million Illumina reads for chickpea genotypes ICC4958 and ICC1882, together with the chickpea trancriptome assembly (CaTA). ? Key results: A nonredundant set of 4543 SNPs was identified between two chickpea genotypes. Experimental validation of 224 randomly selected SNPs showed superiority of Maq among individual tools, as 50.0% of SNPs predicted by Maq were true SNPs. For combinations of two tools, greatest accuracy (55.7%) was reported for Maq and Bowtie, with a combination of Bowtie, Maq, and Novoalign identifying 61.5% true SNPs. SNP prediction accuracy generally increased with increasing reads depth. ? Conclusions: This study provides a benchmark comparison of tools as well as read depths for four commonly used tools for NGS SNP discovery in a crop species without a reference genome sequence. In addition, a large number of SNPs have been identified in chickpea that would be useful for molecular breeding.  相似文献   

6.
Single nucleotide polymorphisms (SNPs) are rapidly becoming the marker of choice in population genetics due to a variety of advantages relative to other markers, including higher genomic density, data quality, reproducibility and genotyping efficiency, as well as ease of portability between laboratories. Advances in sequencing technology and methodologies to reduce genomic representation have made the isolation of SNPs feasible for nonmodel organisms. RNA‐seq is one such technique for the discovery of SNPs and development of markers for large‐scale genotyping. Here, we report the development of 192 validated SNP markers for parentage analysis in Tripterygion delaisi (the black‐faced blenny), a small rocky‐shore fish from the Mediterranean Sea. RNA‐seq data for 15 individual samples were used for SNP discovery by applying a series of selection criteria. Genotypes were then collected from 1599 individuals from the same population with the resulting loci. Differences in heterozygosity and allele frequencies were found between the two data sets. Heterozygosity was lower, on average, in the population sample, and the mean difference between the frequencies of particular alleles in the two data sets was 0.135 ± 0.100. We used bootstrap resampling of the sequence data to predict appropriate sample sizes for SNP discovery. As cDNA library production is time‐consuming and expensive, we suggest that using seven individuals for RNA sequencing reduces the probability of discarding highly informative SNP loci, due to lack of observed polymorphism, whereas use of more than 12 samples does not considerably improve prediction of true allele frequencies.  相似文献   

7.
The benefits from recent improvement in sequencing technologies, such as the Roche GS FLX (454) pyrosequencing, may be even more valuable in non-model organisms, such as many plant pathogenic fungi of economic importance. One application of this new sequencing technology is the rapid generation of genomic information to identify putative single-nucleotide polymorphisms (SNPs) to be used for population genetic, evolutionary, and phylogeographic studies on non-model organisms. The focus of this research was to sequence, assemble, discover and validate SNPs in a fungal genome using 454 pyrosequencing when no reference sequence is available. Genomic DNA from eight isolates of Ophiognomonia clavigignenti-juglandacearum was pooled in one region of a four-region sequencing run on a Roche 454 GS FLX. This yielded 71 million total bases comprising 217,000 reads, 80% of which collapsed into 16,125,754 bases in 30,339 contigs upon assembly. By aligning reads from multiple isolates, we detected 298 SNPs using Roche's GS Mapper. With no reference sequence available, however, it was difficult to distinguish true polymorphisms from sequencing error. Eagleview software was used to manually examine each contig that contained one or more putative SNPs, enabling us to discard all but 45 of the original 298 putative SNPs. Of those 45 SNPs, 13 were validated using standard Sanger sequencing. This research provides a valuable genetic resource for research into the genus Ophiognomonia, demonstrates a framework for the rapid and cost-effective discovery of SNP markers in non-model organisms and should prove especially useful in the case of asexual or clonal fungi with limited genetic variability.  相似文献   

8.
ABSTRACT: BACKGROUND: A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). RESULTS: The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar 'Chandler' were mapped to 48,661 'Chandler' bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to genotype a walnut mapping population having 'Chandler' as one of the parents. Genotyping results were used to adjust the filtering parameters of the updated AGSNP pipeline. With the adjusted filtering criteria, 69.6% of SNPs discovered with the updated pipeline were real and could be mapped on the walnut genetic map. A total of 13,439 SNPs were discovered by BES re-sequencing. BESs harboring SNPs were in 677 FPC contigs covering 98% of the physical map of the walnut genome. CONCLUSION: The updated AGSNP pipeline is a versatile SNP discovery tool for a high-throughput, genome-wide SNP discovery in both autogamous and allogamous species. With this pipeline, a large set of SNPs were identified in a single walnut cultivar.  相似文献   

9.
10.
Single nucleotide polymorphisms (SNPs) are now widely used for many DNA analysis applications such as linkage disequilibrium mapping, pharmacogenomics and traceability. Many methods for SNP genotyping exist with diverse strategies for allele-distinction. Mass spectrometers are used most commonly in conjunction with primer extension procedures with allele-specific termination. Here we present a novel concept for allele-preparation for SNP genotyping. Primer extension is carried out with an extension primer positioned immediately upstream of the SNP that is to be genotyped, a complete set of four ribonucleotides and a ribonucleotide incorporating DNA polymerase. The allele-extension products are then treated with alkali, which results in the cleavage immediately after the first added ribonucleotide. In addition, to obtain fragments easily detectable by mass spectrometry, we have included a ribonucleotide in the primer usually at the fourth nucleotide from the 3′ terminus. The method was tested on four SNPs each with a different combination of nucleotides. The advantage over other mass spectrometry-based SNP genotyping assays is that this one only requires a PCR, a primer extension reaction with a universal extension mix and an inexpensive facile cleavage reaction, which makes it overall very cost effective and easy in handling.  相似文献   

11.
The increase in availability of resequencing data is greatly accelerating SNP discovery and has facilitated the development of SNP genotyping assays. This, in turn, is increasing interest in annotation of individual SNPs. Currently, these data are only available through curation, or comparison to a reference genome. Many species lack a reference genome, but are still important genetic models or are significant species in agricultural production or natural ecosystems. For these species, it is possible to annotate SNPs through comparison with cDNA, or data from well‐annotated genes in public repositories. We present SNPMeta, a tool which gathers information about SNPs by comparison with sequences present in GenBank databases. SNPMeta is able to annotate SNPs from contextual sequence in SNP assay designs, and SNPs discovered through genotyping by sequencing (GBS) approaches. However, SNPs discovered through GBS occur throughout the genome, rather than only in gene space, and therefore do not annotate at high rates. SNPMeta can therefore be used to annotate SNPs in nonmodel species or species that lack a reference genome. Annotations generated by SNPMeta are highly concordant with annotations that would be obtained from a reference genome.  相似文献   

12.
High-throughput procedures are an important requirement for future large-scale genetic studies such as genotyping of single nucleotide polymorphisms (SNPs). Matrix-assisted laser desorption/ ionisation mass spectrometry (MALDI-MS) has revolutionised the analysis of biomolecules and, in particular, provides a very attractive solution for the rapid typing of DNA. The analysis of DNA by MALDI can be significantly facilitated by a procedure termed ‘charge-tagging’. We show here a novel approach for the generation of charge-tagged DNA using a photocleavable linker and its implementation in a molecular biological procedure for SNP genotyping consisting of PCR, primer extension, photocleavage and a chemical reaction prior to MALDI target preparation and analysis. The reaction sequence is amenable to liquid handling automation and requires no stringent purification procedures. We demonstrate this new method on SNPs in two genes involved in complex traits.  相似文献   

13.
CpG methylation is a key component of the epigenome architecture that is associated with changes in gene expression without a change to the DNA sequence. Since the first reports on deregulation of DNA methylation, in diseases such as cancer, and the initiation of the Human Epigenome Project, an increasing need has arisen for a detailed, high-throughput and quantitative method of analysis to discover and validate normal and aberrant DNA methylation profiles in large sample cohorts. Here we present an improved protocol using base-specific fragmentation and MALDI-TOF mass spectrometry that enables a sensitive and high-throughput method of DNA methylation analysis, quantitative to 5% methylation for each informative CpG residue. We have determined the accuracy, variability and sensitivity of the protocol, implemented critical improvements in experimental design and interpretation of the data and developed a new formula to accurately measure CpG methylation. Key innovations now permit determination of differential and allele-specific methylation, such as in cancer and imprinting. The new protocol is ideally suitable for detailed DNA methylation analysis of multiple genomic regions and large sample cohorts that is critical for comprehensive profiling of normal and diseased human epigenomes.  相似文献   

14.
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation. SNPs are important markers that link sequence variations to phenotypic changes. Because of the importance of SNPs in the life and medical sciences, a great deal of effort has been devoted to developing accurate, rapid, and cost-effective technologies for SNP analysis. In this article, we describe a novel method for SNP genotyping based on differential fluorescence emission due to cleavage by Thermus thermophilus RNase HII (TthRNase HII) of DNA heteroduplexes containing an SNP site-specific chimeric DNA-rN1-DNA molecular beacon (cMB). We constructed a loop sequence for a cMB that contains a single SNP-specific ribonucleotide at the central site. When the cMB probe is hybridized to a target double-stranded DNA (dsDNA), a perfect match of the cMB/DNA duplex permits efficient cleavage with TthRNase HII, whereas a mismatch in the duplex due to an SNP greatly reduces efficiency. Cleavage efficiency is measured by the incremental difference of fluorescence emission of the beacon. We show that the genotypes of 10 individuals at 12 SNP sites across a series of human leukocyte antigen (HLA) can be determined correctly with respect to conventional DNA sequencing. This novel TthRNase HII-based method offers a platform for easy and accurate SNP analysis.  相似文献   

15.
16.
17.
Discovering single nucleotide polymorphisms (SNPs) in specific genes in a heterozygous polyploid plant species, such as sugarcane, is challenging because of the presence of a large number of homologues. To discover SNPs for mapping genes of interest, 454 sequencing of 307 polymerase chain reaction (PCR) amplicons (> 59 kb of sequence) was undertaken. One region of a four-gasket sequencing run, on a 454 Genome Sequencer FLX, was used for pooled PCR products amplified from each parent of a quantitative trait locus (QTL) mapping population (IJ76-514 × Q165). The sequencing yielded 96 755 (IJ76-514) and 86 241 (Q165) sequences with perfect matches to a PCR primer used in amplification, with an average sequence depth of approximately 300 and an average read length of 220 bases. Further analysis was carried out on amplicons whose sequences clustered into a single contig using an identity of 80% with the program cap 3. In the more polymorphic sugarcane parent (Q165), 94% of amplicons (227/242) had evidence of a reliable SNP – an average of one every 35 bases. Significantly fewer SNPs were found in the pure Saccharum officinarum parent – with one SNP every 58 bases and SNPs in 86% (213/247) of amplicons. Using automatic SNP detection, 1632 SNPs were detected in Q165 sequences and 1013 in IJ76-514. From 225 candidate SNP sites tested, 209 (93%) were validated as polymorphic using the Sequenom MassARRAY system. Amplicon re-sequencing using the 454 system enables cost-effective SNP discovery that can be targeted to genes of interest and is able to perform in the highly challenging area of polyploid genomes.  相似文献   

18.
In this study, we identified porcine single nucleotide polymorphisms (SNPs) by aligning eight sequences generated with two approaches: amplification of 665 intronic regions using one sample from each of eight breeds, including three East Asian pigs, and amplification of 289 3'-UTR regions using two samples from each of four major commercial breeds. The 1,760 and 599 SNPs were validated using two 384-sample DNA panels by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. The phylogenetic tree and Structure analyses classified the pigs into two large clusters: Euro-American and East Asian populations. The membership proportions, however, differed between inferred clusters for K = 2 generated by the two approaches. With intronic SNPs, Euro-American breeds constituted about 100% of the Euro-American cluster, but with 3'-UTR SNPs, about 17% of the East Asian cluster comprised five Euro-American breeds. The differences in the SNP discovery panels may affect population structure found in study panels of large samples.  相似文献   

19.
R Kota  M Wolf  W Michalek  A Graner 《Génome》2001,44(4):523-528
Recent advances in DNA sequence analysis and the establishment of high-throughput assays have provided the framework for large-scale discovery and analysis of DNA sequence variation. In this context, single nucleotide polymorphisms (SNPs) are of particular interest. To initiate a systematic approach to develop an SNP map of barley (Hordeum vulgare L.), we have employed denaturing high-performance liquid chromatography (DHPLC) to analyse segregating SNP patterns in a doubled-haploid (DH) mapping population. To this end, SNPs between the parental genotypes were identified using a direct sequencing approach. Once a SNP was established between the parents, the optimal melting temperature of the PCR fragment containing the SNP was predicted for its analysis by DHPLC. Following the detection of the optimal temperature, the DH lines were analysed for the presence of either of the alleles. To test the utility of the analysis, data from previously mapped RFLP markers from which these SNPs were derived were compared. Results from these experiments indicate that DHPLC can be efficiently employed in analysing SNPs on a high-throughput scale.  相似文献   

20.
We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号