首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background  

Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae).  相似文献   

4.
Next-generation sequencing (NGS) has proven a valuable platform for fast and easy obtaining of large numbers of sequences at relatively low cost. In this study we use shot-gun sequencing method on Illumina HiSeq 2000, to obtain enough sequences for the assembly of the bryozoan Membranipora grandicella (Bryozoa: Cheilostomatida) mitochondrial genome, which is the first representative of the suborder Malacostegina. The complete mitochondrial genome is 15,861 bp in length, which is relatively larger than other studied bryozoans. The mitochondrial genome contains 13 protein-coding genes, 2 ribosomal RNAs and 20 transfer RNAs. To investigate the phylogenetic position and the inner relationships of the phylum Bryozoa, phylogenetic trees were constructed with amino acid sequences of 11 PCGs from 30 metazoans. Two superclades of protostomes, namely Lophotrochozoa and Ecdysozoa, are recovered as monophyletic with strong support in both ML and Bayesian analyses. Somewhat to surprise, Bryozoa appears as the sister group of Chaetognatha with moderate or high support. The relationship among five bryozoans is Tubulipora flabellaris + (M. grandicella + (Flustrellidra hispida + (Bugula neritina + Watersipora subtorquata))), which supports for the view that Cheilostomatida is not a natural, monophyletic clade. NGS proved to be a quick and easy method for sequencing a complete mitochondrial genome.  相似文献   

5.
DNA sequencing continues to decrease in cost with the Illumina HiSeq2000 generating up to 600 Gb of paired-end 100 base reads in a ten-day run. Here we present a protocol for community amplicon sequencing on the HiSeq2000 and MiSeq Illumina platforms, and apply that protocol to sequence 24 microbial communities from host-associated and free-living environments. A critical question as more sequencing platforms become available is whether biological conclusions derived on one platform are consistent with what would be derived on a different platform. We show that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons.  相似文献   

6.

Background  

Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the de novo assembly in terms of assembly quality and scalability for large-scale short read datasets.  相似文献   

7.
Mungbean [Vigna radiata (L.) Wilczek], a self-pollinated diploid plant with 2n = 22 chromosomes, is an important legume crop with a high-quality amino acid profile. Sequence variation at the whole-genome level was examined by comparing two mungbean cultivars, Sunhwanokdu and Gyeonggijaerae 5, using Illumina HiSeq sequencing data. More than 40 billion bp from both mungbean cultivars were sequenced to a depth of 72×. After de novo assembly of Sunhwanokdu contigs by ABySS 1.3.2 (N50 = 9,958 bp), those longer than 10 kb were aligned with Gyeonggijaerae 5 reads using the Burrows–Wheeler Aligner. SAMTools was used for retrieving single nucleotide polymorphisms (SNPs) between Sunhwanokdu and Gyeonggijaerae 5, defining the lowest and highest depths as 5 and 100, respectively, and the sequence quality as 100. Of the 305,504 single-base changes identified, 40,503 SNPs were considered heterozygous in Gyeonggijaerae 5. Among the remaining 265,001 SNPs, 65.9 % (174,579 cases) were transitions and 34.1 % (90,422 cases) were transversions. For SNP validation, a total of 42 SNPs were chosen among Sunhwanokdu contigs longer than 10 kb and sharing at least 80 % sequence identity with common bean expressed sequence tags as determined with est2genome. Using seven mungbean cultivars from various origins in addition to Sunhwanokdu and Gyeonggijaerae 5, most of the SNPs identified by bioinformatics tools were confirmed by Sanger sequencing. These genome-wide SNP markers could enrich the current molecular resources and might be of value for the construction of a mungbean genetic map and the investigation of genetic diversity.  相似文献   

8.
Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within‐population variation. Additionally, a public Illumina data set was used to validate the pipeline on community‐level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova ) revealed that population structure of Cmesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within‐population structure but also the successful application of the QRS pipeline on Illumina‐generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences .  相似文献   

9.
Little is known about the variations of nematode mitogenomes (mtDNA). Sequencing a complete mtDNA using a PCR approach remains a challenge due to frequent genome reorganizations and low sequence similarities between divergent nematode lineages. Here, a genome skimming approach based on HiSeq sequencing (shotgun) was used to assemble de novo the first complete mtDNA sequence of a root-knot nematode (Meloidogyne graminicola). An AT-rich genome (84.3%) of 20,030 bp was obtained with a mean sequencing depth superior to 300. Thirty-six genes were identified with a semi-automated approach. A comparison with a gene map of the M. javanica mitochondrial genome indicates that the gene order is conserved within this nematode lineage. However, deep genome rearrangements were observed when comparing with other species of the superfamily Hoplolaimoidea. Repeat elements of 111 bp and 94 bp were found in a long non-coding region of 7.5 kb, as similarly reported in Mjavanica and Mhapla. This study points out the power of next generation sequencing to produce complete mitochondrial genomes, even without a reference sequence, and possibly opening new avenues for species/race identification, phylogenetics and population genetics of nematodes.  相似文献   

10.
《遗传学报》2021,48(8):671-680
DNA sequencing is vital for many aspects of biological research and diagnostics. Despite the development of second and third generation sequencing technologies, Sanger sequencing has long been the only choice when required to precisely track each sequenced plasmids or DNA fragments. Here, we report a complete set of novel barcoding and assembling system, Highly-parallel Indexed Tagmentation-reads Assembled Consensus sequencing(HITAC-seq), that could massively sequence and track the identities of each individual sequencing sample. With the cost of much less than that of single read of Sanger sequencing,HITAC-seq can generate high-quality contiguous sequences of up to 10 kilobases or longer. The capability of HITAC-seq was confirmed through large-scale sequencing of thousands of plasmid clones and hundreds of amplicon fragments using approximately 100 pg of input DNAs. Due to its long synthetic length, HITACseq was effective in detecting relatively large structural variations, as demonstrated by the identification of a~1.3 kb Copia retrotransposon insertion in the upstream of a likely maize domestication gene. Besides being a practical alternative to traditional Sanger sequencing, HITAC-seq is suitable for many highthroughput sequencing and genotyping applications.  相似文献   

11.
Here, we present an adaptation of restriction‐site‐associated DNA sequencing (RAD‐seq) to the Illumina HiSeq2000 technology that we used to produce SNP markers in very large quantities at low cost per unit in the Réunion grey white‐eye (Zosterops borbonicus), a nonmodel passerine bird species with no reference genome. We sequenced a set of six pools of 18–25 individuals using a single sequencing lane. This allowed us to build around 600 000 contigs, among which at least 386 000 could be mapped to the zebra finch (Taeniopygia guttata) genome. This yielded more than 80 000 SNPs that could be mapped unambiguously and are evenly distributed across the genome. Thus, our approach provides a good illustration of the high potential of paired‐end RAD sequencing of pooled DNA samples combined with comparative assembly to the zebra finch genome to build large contigs and characterize vast numbers of informative SNPs in nonmodel passerine bird species in a very efficient and cost‐effective way.  相似文献   

12.
Arbuscular mycorrhizal (AM) fungi are known to exhibit high intra‐organism genetic variation. However, information about intra‐ vs. interspecific variation among the genes commonly used in diversity surveys is limited. Here, the nuclear small subunit (SSU) rRNA gene, internal transcribed spacer (ITS) region and large subunit (LSU) rRNA gene portions were sequenced from 3 to 5 individual spores from each of two isolates of Rhizophagus irregularis and Gigaspora margarita. A total of 1482 Sanger sequences (0.5 Mb) from 239 clones were obtained, spanning ~4370 bp of the ribosomal operon when concatenated. Intrasporal and intra‐isolate sequence variation was high for all three regions even though variant numbers were not exhausted by sequencing 12–40 clones per isolate. Intra‐isolate nucleotide variation levels followed the expected order of ITS > LSU > SSU, but the values were strongly dependent on isolate identity. Single nucleotide polymorphism (SNP) densities over 4 SNP/kb in the ribosomal operon were detected in all four isolates. Automated operational taxonomic unit picking within the sequence set of known identity overestimated species richness with almost all cut‐off levels, markers and isolates. Average intraspecific sequence similarity values were 99%, 96% and 94% for amplicons in SSU, LSU and ITS, respectively. The suitability of the central part of the SSU as a marker for AM fungal community surveys was further supported by its level of nucleotide variation, which is similar to that of the ITS region; its alignability across the entire phylum; its appropriate length for next‐generation sequencing; and its ease of amplification in single‐step PCR.  相似文献   

13.
Traditional approaches for sequencing insertion ends of bacterial artificial chromosome (BAC) libraries are laborious and expensive, which are currently some of the bottlenecks limiting a better understanding of the genomic features of auto‐ or allopolyploid species. Here, we developed a highly efficient and low‐cost BAC end analysis protocol, named BAC‐anchor, to identify paired‐end reads containing large internal gaps. Our approach mainly focused on the identification of high‐throughput sequencing reads carrying restriction enzyme cutting sites and searching for large internal gaps based on the mapping locations of both ends of the reads. We sequenced and analysed eight libraries containing over 3 200 000 BAC end clones derived from the BAC library of the tetraploid potato cultivar C88 digested with two restriction enzymes, Cla I and Mlu I. About 25% of the BAC end reads carrying cutting sites generated a 60–100 kb internal gap in the potato DM reference genome, which was consistent with the mapping results of Sanger sequencing of the BAC end clones and indicated large differences between autotetraploid and haploid genotypes in potato. A total of 5341 Cla I‐ and 165 Mlu I‐derived unique reads were distributed on different chromosomes of the DM reference genome and could be used to establish a physical map of target regions and assemble the C88 genome. The reads that matched different chromosomes are especially significant for the further assembly of complex polyploid genomes. Our study provides an example of analysing high‐coverage BAC end libraries with low sequencing cost and is a resource for further genome sequencing studies.  相似文献   

14.
Conserved chloroplast (cp) DNA primer pairs are useful in plant molecular genetics, evolution and ecology. We have designed 20 conserved cpDNA primer pairs that, in combination with 18 previously described ones, amplify overlapping fragments (mean size of 2.5 kb) spanning the large single copy (LSC) region from Eudicots. These 38 primer pairs as well as eight primer pairs flanking cpDNA microsatellites were tested on 20 plant species belonging to 13 families. At least 79% and up to 100% of the LSC (> 86 kb) can be amplified. Many primer pairs are robust and work with all species.  相似文献   

15.
Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavourable for DNA preservation, success in sequence recovery has been uncertain. This study addresses this challenge by employing next‐generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century‐old type specimens of Lepidoptera by attempting to recover 164‐bp and 94‐bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories – high (164‐bp sequence), medium (94‐bp sequence) or low (no sequence). Ten specimens from each category were subsequently analysed via a PCR‐based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458 bp to 610 bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens.  相似文献   

16.
Jatropha curcas is an important non-edible oil seed tree species and is considered a promising source of biodiesel. The complete nucleotide sequence of J. curcas chloroplast genome (cpDNA) was determined by pyrosequencing and gaps filled by Sanger sequencing. The cpDNA is a circular molecule of 163,856 bp in length and codes for 110 distinct genes (78 protein coding, four rRNA and 28 distinct tRNA). Genome organisation and arrangement are similar to the reported angiosperm chloroplast genome. However, in Jatropha, the infA and the rps16 genes are non-functional. The inverted repeat (IR) boundary is within the rpl2 gene, and the 13 nucleotides at the ends of the two duplicate genes are different. Repeat analysis suggests the presence of 72 repeat regions (>30 bp) apart from the IR; of these, 48 were direct and 24 were palindromic repeats. Phylogenetic analysis of 81 protein coding chloroplast genes from 65 taxa by maximum parsimony, maximum likelihood and minimum evolution analyses at 100 bootstraps provide strong support for the placement of inaperturate crotonoids of which Jatropha is a member as sister to articulated crotonoids of which Manihot is a member.  相似文献   

17.
Whole genome resequencing of 51 Populus nigra (L.) individuals from across Western Europe was performed using Illumina platforms. A total number of 1 878 727 SNPs distributed along the P. nigra reference sequence were identified. The SNP calling accuracy was validated with Sanger sequencing. SNPs were selected within 14 previously identified QTL regions, 2916 expressional candidate genes related to rust resistance, wood properties, water‐use efficiency and bud phenology and 1732 genes randomly spread across the genome. Over 10 000 SNPs were selected for the construction of a 12k Infinium Bead‐Chip array dedicated to association mapping. The SNP genotyping assay was performed with 888 P. nigra individuals. The genotyping success rate was 91%. Our high success rate was due to the discovery panel design and the stringent parameters applied for SNP calling and selection. In the same set of P. nigra genotypes, linkage disequilibrium throughout the genome decayed on average within 5–7 kb to half of its maximum value. As an application test, ADMIXTURE analysis was performed with a selection of 600 SNPs spread throughout the genome and 706 individuals collected along 12 river basins. The admixture pattern was consistent with genetic diversity revealed by neutral markers and the geographical distribution of the populations. These newly developed SNP resources and genotyping array provide a valuable tool for population genetic studies and identification of QTLs through natural‐population based genetic association studies in P. nigra.  相似文献   

18.
采用二代和三代测序技术分别对金针菇单核体菌株“6-3”进行测序,应用4种组装策略进行基因组的de novo组装,对比组装效果。基因组组装的参数方面,仅使用二代测序组装的效果最差,长度大于10kb的Contig全长只有24.6Mb,Contig N50只有23kb,组装率只有59.27%。采用三代组装二代校正的组装策略效果最好,长度大于10kb的Contig全长为38.3Mb,Contig N50为2.8Mb,组装率高达92.16%。保守单拷贝基因拼接效果方面,4种组装策略获得基因组序列与BUSCO数据库里的担子菌的保守单拷贝基因比对,基因完整性均大于94%。在组装准确性方面,经过PCR扩增、Sanger测序验证,三代组装二代校正的基因组序列完整并且连续,同时序列上碱基的SNP、InDel数量最少。综上所述,三代组装二代校正得到的基因组序列具有Contig N50值大、组装率高、碱基准确性高的特点,是食用菌基因组测序较为理想的方案。  相似文献   

19.

Background  

The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases.  相似文献   

20.
Adzuki bean, also known as red bean (Vigna angularis), with 2n = 22 chromosomes, is an important legume crop in East Asian countries, including China, Japan, and Korea. For single nucleotide polymorphism (SNP) discovery, we used Vigna accessions, V. angularis IT213134 and its wild relative V. nakashimae IT178530, because of the lack of DNA sequence polymorphism in the cultivated species. Short read sequences of IT213134 and IT178530 of approximately 37 billion and 35 billion bp were produced using the Illumina HiSeq 2000 system to a sequencing depth of 61.5× and 57.7×, respectively. After de novo assembly was carried out with trimmed HiSeq reads from IT213134, 98,441 contigs of various sizes were produced with N50 of 13,755 bp. Using Burrows–Wheeler Aligner software, trimmed short reads of V. nakashimae IT178530 were successfully mapped to IT213134 contigs. All sequence variations at the whole-genome level were examined between the two Vigna species. Of the 1,565,699 SNPs, 59.4 % were transitions and 40.6 % were transversions. A total of 213,758 SNPs, consisting of 122,327 non-synonymous and 91,431 synonymous SNPs, were identified in coding sequences. For SNP validation, 96 SNPs in the genic region were chosen from among IT213134 contigs longer than 10 kb. Of these 96 SNPs, 88 were confirmed by Sanger sequencing of 10 adzuki bean genotypes from various geographic origins as well as IT213134 and its wild relative IT178530. These genome-wide SNP markers will enrich the existing Vigna resources and, specifically, could be of value for constructing a genetic map and evaluating the genetic diversity of adzuki bean.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号