首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We describe an approach for targeted genome resequencing, called oligonucleotide-selective sequencing (OS-Seq), in which we modify the immobilized lawn of oligonucleotide primers of a next-generation DNA sequencer to function as both a capture and sequencing substrate. We apply OS-Seq to resequence the exons of either 10 or 344 cancer genes from human DNA samples. In our assessment of capture performance, >87% of the captured sequence originated from the intended target region with sequencing coverage falling within a tenfold range for a majority of all targets. Single nucleotide variants (SNVs) called from OS-Seq data agreed with >95% of variants obtained from whole-genome sequencing of the same individual. We also demonstrate mutation discovery from a colorectal cancer tumor sample matched with normal tissue. Overall, we show the robust performance and utility of OS-Seq for the resequencing analysis of human germline and cancer genomes.  相似文献   

2.
Single-cell genome sequencing methods are challenged by poor physical coverage and high error rates, making it difficult to distinguish real biological variants from technical artifacts. To address this problem, we developed a method called SNES that combines flow-sorting of single G1/0 or G2/M nuclei, time-limited multiple-displacement-amplification, exome capture, and next-generation sequencing to generate high coverage (96%) data from single human cells. We validated our method in a fibroblast cell line, and show low allelic dropout and false-positive error rates, resulting in high detection efficiencies for single nucleotide variants (92%) and indels (85%) in single cells.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0616-2) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.

Results

We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.

Conclusions

Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data.  相似文献   

4.
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.  相似文献   

5.
《Genomics》2020,112(2):1245-1256
Genetic laboratories use custom-commercial targeted next-generation sequencing (tg-NGS) assays to identify disease-causing variants. Although the high coverage achieved with these tests allows for the detection of copy number variants (CNVs), which account for an important proportion of the genetic burden in human diseases, an easy-to-use tool for automatic CNV detection is still lacking. This article presents a new CNV detection tool optimized for tg-NGS data: PattRec. PattRec was evaluated using a wide range of data, and its performance compared with those of other CNV detection tools. The software includes features for selecting optimal controls, discarding polymorphic CNVs prior to analysis, and filtering out deletions based on SNV zygosity, and automatically creates an in-house CNV database. There is no need for high level bioinformatic expertise and users can choose color-coded xlsx output that helps to prioritize potentially pathogenic CNVs. PattRec is presented as a Java based GUI, freely available online: https://github.com/irotero/PattRec.  相似文献   

6.
With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.  相似文献   

7.
Dilated cardiomyopathy (DCM) commonly causes heart failure and shows extensive genetic heterogeneity that may be amenable to newly developed next-generation DNA sequencing of the exome. In this study we report the successful use of exome sequencing to identify a pathogenic variant in the TNNT2 gene using segregation analysis in a large DCM family. Exome sequencing was performed on three distant relatives from a large family with a clear DCM phenotype. Missense, nonsense, and splice variants were analyzed for segregation among the three affected family members and confirmed in other relatives by direct sequencing. A c.517T C>T, Arg173Trp TNNT2 variant segregated with all affected family members and was also detected in one additional DCM family in our registry. The inclusion of segregation analysis using distant family members markedly improved the bioinformatics filtering process by removing from consideration variants that were not shared by all affected subjects. Haplotype analysis confirmed that the variant found in both DCM families was located on two distinct haplotypes, supporting the notion of independent mutational events in each family. In conclusion, an exome sequencing strategy that includes segregation analysis using distant affected relatives within a family represents a viable diagnostic strategy in a genetically heterogeneous disease like DCM.  相似文献   

8.
During the course of the COVID-19 pandemic, large-scale genome sequencing of SARS-CoV-2 has been useful in tracking its spread and in identifying variants of concern (VOC). Viral and host factors could contribute to variability within a host that can be captured in next-generation sequencing reads as intra-host single nucleotide variations (iSNVs). Analysing 1347 samples collected till June 2020, we recorded 16 410 iSNV sites throughout the SARS-CoV-2 genome. We found ∼42% of the iSNV sites to be reported as SNVs by 30 September 2020 in consensus sequences submitted to GISAID, which increased to ∼80% by 30th June 2021. Following this, analysis of another set of 1774 samples sequenced in India between November 2020 and May 2021 revealed that majority of the Delta (B.1.617.2) and Kappa (B.1.617.1) lineage-defining variations appeared as iSNVs before getting fixed in the population. Besides, mutations in RdRp as well as RNA-editing by APOBEC and ADAR deaminases seem to contribute to the differential prevalence of iSNVs in hosts. We also observe hyper-variability at functionally critical residues in Spike protein that could alter the antigenicity and may contribute to immune escape. Thus, tracking and functional annotation of iSNVs in ongoing genome surveillance programs could be important for early identification of potential variants of concern and actionable interventions.  相似文献   

9.
Congenital heart disease (CHD) is a common group of birth defects with a strong genetic contribution to their etiology, but historically the diagnostic yield from exome studies of isolated CHD has been low. Pleiotropy, variable expressivity, and the difficulty of accurately phenotyping newborns contribute to this problem. We hypothesized that performing exome sequencing on selected individuals in families with multiple members affected by left-sided CHD, then filtering variants by population frequency, in silico predictive algorithms, and phenotypic annotations from publicly available databases would increase this yield and generate a list of candidate disease-causing variants that would show a high validation rate. In eight of the nineteen families in our study (42%), we established a well-known gene/phenotype link for a candidate variant or performed confirmation of a candidate variant’s effect on protein function, including variants in genes not previously described or firmly established as disease genes in the body of CHD literature: BMP10, CASZ1, ROCK1 and SMYD1. Two plausible variants in different genes were found to segregate in the same family in two instances suggesting oligogenic inheritance. These results highlight the need for functional validation and demonstrate that in the era of next-generation sequencing, multiplex families with isolated CHD can still bring high yield to the discovery of novel disease genes.  相似文献   

10.
Accurate identification of sparse heterozygous single-nucleotide variants (SNVs) is a critical challenge for identifying the causative mutations in mouse genetic screens, human genetic diseases and cancer. When seeking to identify causal DNA variants that occur at such low rates, they are overwhelmed by false-positive calls that arise from a range of technical and biological sources. We describe a strategy using whole-exome capture, massively parallel DNA sequencing and computational analysis, which identifies with a low false-positive rate the majority of heterozygous and homozygous SNVs arising de novo with a frequency of one nucleotide substitution per megabase in progeny of N-ethyl-N-nitrosourea (ENU)-mutated C57BL/6j mice. We found that by applying a strategy of filtering raw SNV calls against known and platform-specific variants we could call true SNVs with a false-positive rate of 19.4 per cent and an estimated false-negative rate of 21.3 per cent. These error rates are small enough to enable calling a causative mutation from both homozygous and heterozygous candidate mutation lists with little or no further experimental validation. The efficacy of this approach is demonstrated by identifying the causative mutation in the Ptprc gene in a lymphocyte-deficient strain and in 11 other strains with immune disorders or obesity, without the need for meiotic mapping. Exome sequencing of first-generation mutant mice revealed hundreds of unphenotyped protein-changing mutations, 52 per cent of which are predicted to be deleterious, which now become available for breeding and experimental analysis. We show that exome sequencing data alone are sufficient to identify induced mutations. This approach transforms genetic screens in mice, establishes a general strategy for analysing rare DNA variants and opens up a large new source for experimental models of human disease.  相似文献   

11.
Next-generation sequencing technologies have revolutionized our ability to identify genetic variants, either germline or somatic point mutations, that occur in cancer. Parallelization and miniaturization of DNA sequencing enables massive data throughput and for the first time, large-scale, nucleotide resolution views of cancer genomes can be achieved. Systematic, large-scale sequencing surveys have revealed that the genetic spectrum of mutations in cancers appears to be highly complex with numerous low frequency bystander somatic variations, and a limited number of common, frequently mutated genes. Large sample sizes and deeper resequencing are much needed in resolving clinical and biological relevance of the mutations as well as in detecting somatic variants in heterogeneous samples and cancer cell sub-populations. However, even with the next-generation sequencing technologies, the overwhelming size of the human genome and need for very high fold coverage represents a major challenge for up-scaling cancer genome sequencing projects. Assays to target, capture, enrich or partition disease-specific regions of the genome offer immediate solutions for reducing the complexity of the sequencing libraries. Integration of targeted DNA capture assays and next-generation deep resequencing improves the ability to identify clinically and biologically relevant mutations.  相似文献   

12.
13.
To date, the widely used genome-wide association studies (GWASs) of the human genome have reported thousands of variants that are significantly associated with various human traits. However, in the vast majority of these cases, the causal variants responsible for the observed associations remain unknown. In order to facilitate the identification of causal variants, we designed a simple computational method called the "preferential linkage disequilibrium (LD)" approach, which follows the variants discovered by GWASs to pinpoint the causal variants, even if they are rare compared with the discovery variants. The approach is based on the hypothesis that the GWAS-discovered variant is better at tagging the causal variants than are most other variants evaluated in the original GWAS. Applying the preferential LD approach to the GWAS signals of five human traits for which the causal variants are already known, we successfully placed the known causal variants among the top ten candidates in the majority of these cases. Application of this method to additional GWASs, including those of hepatitis C virus treatment response, plasma levels of clotting factors, and late-onset Alzheimer disease, has led to the identification of a number of promising candidate causal variants. This method represents a useful tool for delineating causal variants by bringing together GWAS signals and the rapidly accumulating variant data from next-generation sequencing.  相似文献   

14.
Structural variations are widespread in the human genome and can serve as genetic markers in clinical and evolutionary studies. With the advances in the next-generation sequencing technology, recent methods allow for identification of structural variations with unprecedented resolution and accuracy. They also provide opportunities to discover variants that could not be detected on conventional microarray-based platforms, such as dosage-invariant chromosomal translocations and inversions. In this review, we will describe some of the sequencing-based algorithms for detection of structural variations and discuss the key issues in future development.  相似文献   

15.
The identification of disease-causing mutations in next-generation sequencing (NGS) data requires efficient filtering techniques. In patients with rare recessive diseases, compound heterozygosity of pathogenic mutations is the most likely inheritance model if the parents are non-consanguineous. We developed a web-based compound heterozygous filter that is suited for data from NGS projects and that is easy to use for non-bioinformaticians. We analyzed the power of compound heterozygous mutation filtering by deriving background distributions for healthy individuals from different ethnicities and studied the effectiveness in trios as well as more complex pedigree structures. While usually more then 30 genes harbor potential compound heterozygotes in single exomes, this number can be markedly reduced with every additional member of the pedigree that is included in the analysis. In a real data set with exomes of four family members, two sisters affected by Mabry syndrome and their healthy parents, the disease-causing gene PIGO, which harbors the pathogenic compound heterozygous variants, could be readily identified. Compound heterozygous filtering is an efficient means to reduce the number of candidate mutations in studies aiming at identifying recessive disease genes in non-consanguineous families. A web-server is provided to make this filtering strategy available at www.gene-talk.de.  相似文献   

16.
Next-generation sequencing (NGS) has revolutionized genetics and enabled the accurate identification of many genetic variants across many genomes. However, detection of biologically important low-frequency variants within genetically heterogeneous populations remains challenging, because they are difficult to distinguish from intrinsic NGS sequencing error rates. Approaches to overcome these limitations are essential to detect rare mutations in large cohorts, virus or microbial populations, mitochondria heteroplasmy, and other heterogeneous mixtures such as tumors. Modifications in library preparation can overcome some of these limitations, but are experimentally challenging and restricted to skilled biologists. This paper describes a novel quality filtering and base pruning pipeline, called Complex Heterogeneous Overlapped Paired-End Reads (CHOPER), designed to detect sequence variants in a complex population with high sequence similarity derived from All-Codon-Scanning (ACS) mutagenesis. A novel fast alignment algorithm, designed for the specified application, has O(n) time complexity. CHOPER was applied to a p53 cancer mutant reactivation study derived from ACS mutagenesis. Relative to error filtering based on Phred quality scores, CHOPER improved accuracy by about 13% while discarding only half as many bases. These results are a step toward extending the power of NGS to the analysis of genetically heterogeneous populations.  相似文献   

17.

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users.  相似文献   

18.
Aortic aneurysm and/or dissection (AAD) is a life-threatening condition, and several syndromes are known to be related to AAD. In this study, two new technologies, resequencing array technology (ResAT) and next-generation sequencing (NGS), were used to analyze eight genes associated with syndromic AAD in 70 patients with non-syndromic AAD. Eighteen sequence variants were detected using both ResAT and NGS. In addition one of these sequence variants was detected by ResAT only and two additional variants by NGS only. Three of the 18 variants are likely to be pathogenic (in 4.3% of AAD patients and in 8.6% of a subset of patients with thoracic AAD), highlighting the importance of genetic analysis in non-syndromic AAD. ResAT and NGS similarly detected most, but not all, of the variants. Resequencing array technology was a rapid and efficient method for detecting most nucleotide substitutions, but was unable to detect short insertions/deletions, and it is impractical to update custom arrays frequently. Next-generation sequencing was able to detect almost all types of mutation, but requires improved informatics methods.  相似文献   

19.
Mutations in mitochondrial DNA (mtDNA) may cause maternally-inherited cardiomyopathy and heart failure. In homoplasmy all mtDNA copies contain the mutation. In heteroplasmy there is a mixture of normal and mutant copies of mtDNA. The clinical phenotype of an affected individual depends on the type of genetic defect and the ratios of mutant and normal mtDNA in affected tissues. We aimed at determining the sensitivity of next-generation sequencing compared to Sanger sequencing for mutation detection in patients with mitochondrial cardiomyopathy. We studied 18 patients with mitochondrial cardiomyopathy and two with suspected mitochondrial disease. We “shotgun” sequenced PCR-amplified mtDNA and multiplexed using a single run on Roche''s 454 Genome Sequencer. By mapping to the reference sequence, we obtained 1,300× average coverage per case and identified high-confidence variants. By comparing these to >400 mtDNA substitution variants detected by Sanger, we found 98% concordance in variant detection. Simulation studies showed that >95% of the homoplasmic variants were detected at a minimum sequence coverage of 20× while heteroplasmic variants required >200× coverage. Several Sanger “misses” were detected by 454 sequencing. These included the novel heteroplasmic 7501T>C in tRNA serine 1 in a patient with sudden cardiac death. These results support a potential role of next-generation sequencing in the discovery of novel mtDNA variants with heteroplasmy below the level reliably detected with Sanger sequencing. We hope that this will assist in the identification of mtDNA mutations and key genetic determinants for cardiomyopathy and mitochondrial disease.  相似文献   

20.
Duchenne/Becker muscular dystrophies are the most frequent inherited neuromuscular diseases caused by mutations of the dystrophin gene. However, approximately 30 % of patients with the disease do not receive a molecular diagnosis because of the complex mutational spectrum and the large size of the gene. The introduction and use of next-generation sequencing have advanced clinical genetic research and might be a suitable method for the detection of various types of mutations in the dystrophin gene. To identify the mutational spectrum using a single platform, whole dystrophin gene sequencing was performed using next-generation sequencing. The entire dystrophin gene, including all exons, introns and promoter regions, was target enriched using a DMD whole gene enrichment kit. The enrichment libraries were sequenced on an Illumina HiSeq 2000 sequencer using paired read 100 bp sequencing. We studied 26 patients: 21 had known large deletion/duplications and 5 did not have detectable large deletion/duplications by multiplex ligation-dependent probe amplification technology (MLPA). We applied whole dystrophin gene analysis by next-generation sequencing to the five patients who did not have detectable large deletion/duplications and to five randomly chosen patients from the 21 who did have large deletion/duplications. The sequencing data covered almost 100 % of the exonic region of the dystrophin gene by ≥10 reads with a mean read depth of 147. Five small mutations were identified in the first five patients, of which four variants were unreported in the dmd.nl database. The deleted or duplicated exons and the breakpoints in the five large deletion/duplication patients were precisely identified. Whole dystrophin gene sequencing by next-generation sequencing may be a useful tool for the genetic diagnosis of Duchenne and Becker muscular dystrophies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号