首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We developed a generalized framework for multiplexed resequencing of targeted human genome regions on the Illumina Genome Analyzer using degenerate indexed DNA bar codes ligated to fragmented DNA before sequencing. Using this method, we simultaneously sequenced the DNA of multiple HapMap individuals at several Encyclopedia of DNA Elements (ENCODE) regions. We then evaluated the use of Bayes factors for discovering and genotyping polymorphisms. For polymorphisms that were either previously identified within the Single Nucleotide Polymorphism database (dbSNP) or visually evident upon re-inspection of archived ENCODE traces, we observed a false positive rate of 11.3% using strict thresholds for predicting variants and 69.6% for lax thresholds. Conversely, false negative rates were 10.8-90.8%, with false negatives at stricter cut-offs occurring at lower coverage (<10 aligned reads). These results suggest that >90% of genetic variants are discoverable using multiplexed sequencing provided sufficient coverage at the polymorphic base.  相似文献   

2.
3.
Next-generation DNA sequencing has revolutionized the field of genetics and genomics, providing researchers with the tools to efficiently identify novel rare and low frequency risk variants, which was not practical with previously available methodologies. These methods allow for the sequence capture of a specific locus or small genetic region all the way up to the entire six billion base pairs of the diploid human genome. Rheumatic diseases are a huge burden on the US population, affecting more than 46 million Americans. Those afflicted suffer from one or more of the more than 100 diseases characterized by inflammation and loss of function, mainly of the joints, tendons, ligaments, bones, and muscles. While genetics studies of many of these diseases (for example, systemic lupus erythematosus, rheumatoid arthritis, and inflammatory bowel disease) have had major successes in defining their genetic architecture, causal alleles and rare variants have still been elusive. This review describes the current high-throughput DNA sequencing methodologies commercially available and their application to rheumatic diseases in both case–control as well as family-based studies.  相似文献   

4.
5.
Molecular Biology Reports - Mitochondrial diseases are a clinically heterogeneous group of multisystemic disorders that arise as a result of various mitochondrial dysfunctions. Autosomal recessive...  相似文献   

6.
A thorough understanding of the relationships between plants and pathogens is essential if we are to continue to meet the agricultural needs of the world's growing population. The identification of genes underlying important quantitative trait loci is extremely challenging in complex genomes such as Brassica napus (canola, oilseed rape or rapeseed). However, recent advances in next-generation sequencing (NGS) enable much quicker identification of candidate genes for traits of interest. Here, we demonstrate this with the identification of candidate disease resistance genes from B.?napus for its most devastating fungal pathogen, Leptosphaeria maculans (blackleg fungus). These two species are locked in an evolutionary arms race whereby a gene-for-gene interaction confers either resistance or susceptibility in the plant depending on the genotype of the plant and pathogen. Preliminary analysis of the complete genome sequence of Brassica rapa, the diploid progenitor of B.?napus, identified numerous candidate genes with disease resistance characteristics, several of which were clustered around a region syntenic with a major locus (Rlm4) for blackleg resistance on A7 of B.?napus. Molecular analyses of the candidate genes using B.?napus NGS data are presented, and the difficulties associated with identifying functional gene copies within the highly duplicated Brassica genome are discussed.  相似文献   

7.
Highly abundant microRNAs (miRNAs) in small RNA sequencing libraries make it difficult to obtain efficient measurements of more lowly expressed species. We present a new method that allows for the selective blocking of specific, abundant miRNAs during preparation of sequencing libraries. This technique is specific with little off-target effects and has no impact on the reproducibility of the measurement of non-targeted species. In human plasma samples, we demonstrate that blocking of highly abundant hsa-miR-16–5p leads to improved detection of lowly expressed miRNAs and more precise measurement of differential expression overall. Furthermore, we establish the ability to target a second abundant miRNA and to multiplex the blocking of two miRNAs simultaneously. For small RNA sequencing, this technique could fill a similar role as do ribosomal or globin removal technologies in messenger RNA sequencing.  相似文献   

8.
9.

Background

The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases.

Results

We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools.

Conclusions

indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0483-6) contains supplementary material, which is available to authorized users.  相似文献   

10.
ABSTRACT: BACKGROUND: Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). RESULTS: We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent 'read-backmapping' to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. CONCLUSIONS: We recommend applying our general 'two-step' mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.  相似文献   

11.
12.
13.
Linseed (Linum usitatissimum L.) is regarded as a cash crop of tomorrow because of the presence of nutraceutically important ??-linolenic acid (ALA) and lignan. However, only limited breeding progress has been made in this crop, mainly due to the lack of sufficient genetic and genomic resources. Among these, simple sequence repeats (SSR) are useful DNA markers for diversity analysis, genetic mapping and tagging traits because of their co-dominant and highly polymorphic nature. In order to develop SSR markers for linseed, we used three microsatellite isolation methods, viz., PCR Isolation of Microsatellite Arrays (PIMA), 5??-anchored PCR method, and Fast Isolation by AFLP of Sequences COntaining repeats (FIASCO). The amplified products from these methods were pooled and sequenced using the 454 GS-FLX platform. A total of 36,332 reads were obtained, which assembled into 2,183 contigs and 2,509 singlets. The contigs and the singlets contained 1,842 microsatellite motifs, with dinucleotide motifs as the most abundant repeat type (54%) followed by trinucleotide motifs (44%). Based on this, 290 SSR markers were designed, 52 of which were evaluated using a panel of 27 diverse linseed genotypes. Among the three enrichment methods, the 5??-anchored PCR method was most efficient for isolation of microsatellites, while FIASCO was most efficient for developing SSR markers. We show the utility of next-generation sequencing technology for efficiently discovering a large number of microsatellite markers in non-model plants.  相似文献   

14.

Background

Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.

Results

We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.

Conclusions

Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data.  相似文献   

15.
Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA), also known as neonatal-onset multisystem inflammatory disease (NOMID). In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program—which detects and avoids common SNPs in gene-specific PCR primers—we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of somatic mosaic mutations.  相似文献   

16.
17.
18.

Background

Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls.

Results

We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced.

Conclusions

Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.
Formalin-fixed paraffin-embedded (FFPE) tissues are utilized as the standard diagnostic method in pathology laboratories. However, admixture of unwanted tissues and shortage of normal samples, which can be used to detect somatic mutation, are considered critical factors to accurately diagnose cancer. To explore these challenges, we sorted the pure tumor cells from 22 FFPE lung adenocarcinoma tissues via Di-Electro-Phoretic Array (DEPArray) technology, a new cell sorting technology, and analyzed the variants with next-generation sequencing (NGS) for the most accurate analysis. The allele frequencies of the all gene mutations were improved by 1.2 times in cells sorted via DEPArray (tumor suppressor genes, 1.3–10.1 times; oncogenes, 1.3–2.6 times). We identified 16 novel mutations using the sequencing from sorted cells via DEPArray technology, compared to detecting 4 novel mutation by the sequencing from unsorted cells. Using this analysis, we also revealed that five genes (TP53, EGFR, PTEN, RB1, KRAS, and CTNNB1) were somatically mutated in multiple homogeneous lung adenocarcinomas. Together, we sorted pure tumor cells from 22 FFPE lung adenocarcinomas by DEPArray technology and identified 16 novel somatic mutations. We also established the precise genomic landscape for more accurate diagnosis in 22 lung adenocarcinomas with mutations detected in pure tumor cells. The results obtained in this study could offer new avenues for the treatment and the diagnosis of squamous cell lung cancers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号