共查询到20条相似文献,搜索用时 15 毫秒
1.
Sébastien Rodrigue Arne C. Materna Sonia C. Timberlake Matthew C. Blackburn Rex R. Malmstrom Eric J. Alm Sallie W. Chisholm 《PloS one》2010,5(7)
Background
Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved.Methodology/Principal Findings
We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.Conclusions/Significance
This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing. 相似文献2.
Structural variation (SV) has been reported to be associated with numerous diseases such as cancer. With the advent of next generation sequencing (NGS) technologies, various types of SV can be potentially identified. We propose a model based clustering approach utilizing a set of features defined for each type of SV events. Our method, termed SVMiner, not only provides a probability score for each candidate, but also predicts the heterozygosity of genomic deletions. Extensive experiments on genome-wide deep sequencing data have demonstrated that SVMiner is robust against the variability of a single cluster feature, and it significantly outperforms several commonly used SV detection programs. SVMiner can be downloaded from http://cbc.case.edu/svminer/. 相似文献
3.
Asan Chunyu Geng Yan Chen Kui Wu Qingle Cai Yu Wang Yongshan Lang Hongzhi Cao Huangming Yang Jian Wang Xiuqing Zhang 《PloS one》2012,7(9)
Background
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.Results
We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.Conclusions
In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads. 相似文献4.
Jean-Michel Claverie 《Genomics》1994,23(3)
The random (shotgun) DNA sequencing strategy is used for most large-scale sequencing projects, including the identification of human disease genes after positional cloning. The principle of the method--sequence assembly from overlap--requires the candidate gene region to be partitioned into 15- to 20-kb pieces (usually λ inserts), themselves randomly subcloned into M13 prior to sequencing with a 6- to 8-fold redundancy. Most often, a time-consuming directed strategy must be invoked to close the remaining gaps. Ultimately, computer-based methods are invoked to locate putative coding exons within the finished genomic sequence. Given the small average size of vertebrate exons, I show here that they can be detected from the computer analysis of the individual runs, much before completion of contiguity. However, the successful assessment of coding potential from the raw data depends on a combination of new sequence masking techniques. When the identification of coding exons is the primary goal, the usual random sequencing strategy can thus be greatly optimized. The streamlined approach requires only a 2- to 2.5-fold sequencing redundancy, can dispense with the subcloning in λ and the closure of gaps, and can be fully automated. The feasibility of this strategy is demonstrated using data from the X-linked Kallmann syndrome gene region. 相似文献
5.
6.
Steven Sijmons Kim Thys Micha?l Corthout Ellen Van Damme Marnix Van Loock Stefanie Bollen Sylvie Baguet Jeroen Aerssens Marc Van Ranst Piet Maes 《PloS one》2014,9(4)
Human cytomegalovirus (HCMV) is a ubiquitous virus that can cause serious sequelae in immunocompromised patients and in the developing fetus. The coding capacity of the 235 kbp genome is still incompletely understood, and there is a pressing need to characterize genomic contents in clinical isolates. In this study, a procedure for the high-throughput generation of full genome consensus sequences from clinical HCMV isolates is presented. This method relies on low number passaging of clinical isolates on human fibroblasts, followed by digestion of cellular DNA and purification of viral DNA. After multiple displacement amplification, highly pure viral DNA is generated. These extracts are suitable for high-throughput next-generation sequencing and assembly of consensus sequences. Throughout a series of validation experiments, we showed that the workflow reproducibly generated consensus sequences representative for the virus population present in the original clinical material. Additionally, the performance of 454 GS FLX and/or Illumina Genome Analyzer datasets in consensus sequence deduction was evaluated. Based on assembly performance data, the Illumina Genome Analyzer was the platform of choice in the presented workflow. Analysis of the consensus sequences derived in this study confirmed the presence of gene-disrupting mutations in clinical HCMV isolates independent from in vitro passaging. These mutations were identified in genes RL5A, UL1, UL9, UL111A and UL150. In conclusion, the presented workflow provides opportunities for high-throughput characterization of complete HCMV genomes that could deliver new insights into HCMV coding capacity and genetic determinants of viral tropism and pathogenicity. 相似文献
7.
Elisa Contini Irene Paganini Roberta Sestini Luisa Candita Gabriele Lorenzo Capone Lorenzo Barbetti Serena Falconi Sabrina Frusconi Irene Giotti Costanza Giuliani Francesca Torricelli Matteo Benelli Laura Papi 《PloS one》2015,10(6)
The accurate detection of low-allelic variants is still challenging, particularly for the identification of somatic mosaicism, where matched control sample is not available. High throughput sequencing, by the simultaneous and independent analysis of thousands of different DNA fragments, might overcome many of the limits of traditional methods, greatly increasing the sensitivity. However, it is necessary to take into account the high number of false positives that may arise due to the lack of matched control samples. Here, we applied deep amplicon sequencing to the analysis of samples with known genotype and variant allele fraction (VAF) followed by a tailored statistical analysis. This method allowed to define a minimum value of VAF for detecting mosaic variants with high accuracy. Then, we exploited the estimated VAF to select candidate alterations in NF2 gene in 34 samples with unknown genotype (30 blood and 4 tumor DNAs), demonstrating the suitability of our method. The strategy we propose optimizes the use of deep amplicon sequencing for the identification of low abundance variants. Moreover, our method can be applied to different high throughput sequencing approaches to estimate the background noise and define the accuracy of the experimental design. 相似文献
8.
9.
Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.
This is part of the PLOS Computational Biology Education collection.相似文献
10.
11.
Xiao Zhu Henry C. M. Leung Francis Y. L. Chin Siu Ming Yiu Guangri Quan Bo Liu Yadong Wang 《PloS one》2014,9(12)
Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of paired-end and single-end reads when resolving branches. Since they treat all single-end reads with overlapped length larger than a fix threshold equally, they fail to use the more confident long overlapped reads for assembling and mix up with the relative short overlapped reads. Moreover, these approaches have not been special designed for handling tandem repeats (repeats occur adjacently in the genome) and they usually break down the contigs near the tandem repeats. We present PERGA (Paired-End Reads Guided Assembler), a novel sequence-reads-guided de novo assembly approach, which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds using paired-end reads and different read overlap size ranging from O
max to O
min to resolve the gaps and branches. By constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. When the correct extension cannot be determined, PERGA will try to extend the contig by all feasible extensions and determine the correct extension by using look-ahead approach. Many difficult-resolved branches are due to tandem repeats which are close in the genome. PERGA detects such different copies of the repeats to resolve the branches to make the extension much longer and more accurate. We evaluated PERGA on both Illumina real and simulated datasets ranging from small bacterial genomes to large human chromosome, and it constructed longer and more accurate contigs and scaffolds than other state-of-the-art assemblers. PERGA can be freely downloaded at https://github.com/hitbio/PERGA. 相似文献
12.
Jiang Du Robert D. Bjornson Zhengdong D. Zhang Yong Kong Michael Snyder Mark B. Gerstein 《PLoS computational biology》2009,5(7)
The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost. 相似文献
13.
Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting secondary-trait RV associations. 相似文献
14.
15.
The genetic codon UGA has a dual function: serving as a terminator and encoding selenocysteine. However, most popular gene annotation programs only take it as a stop signal, resulting in misannotation or completely missing selenoprotein genes. We developed a computational method named Asec-Prediction that is specific for the prediction of archaeal selenoprotein genes. To evaluate its effectiveness, we first applied it to 14 archaeal genomes with previously known selenoprotein genes, and Asec-Prediction identified all reported selenoprotein genes without redundant results. When we applied it to 12 archaeal genomes that had not been researched for selenoprotein genes, Asec-Prediction detected a novel selenoprotein gene in Methanosarcina acetivorans. Further evidence was also collected to support that the predicted gene should be a real selenoprotein gene. The result shows that Asec-Prediction is effective for the prediction of archaeal selenoprotein genes. 相似文献
16.
17.
Due to the growth of interest in single-cell genomics, computational methods for distinguishing true variants from artifacts are highly desirable. While special attention has been paid to false positives in variant or mutation calling from single-cell sequencing data, an equally important but often neglected issue is that of false negatives derived from allele dropout during the amplification of single cell genomes. In this paper, we propose a simple strategy to reduce the false negatives in single-cell sequencing data analysis. Simulation results show that this method is highly reliable, with an error rate of 4.94×10-5, which is orders of magnitude lower than the expected false negative rate (~34%) estimated from a single-cell exome dataset, though the method is limited by the low SNP density in the human genome. We applied this method to analyze the exome data of a few dozen single tumor cells generated in previous studies, and extracted cell specific mutation information for a small set of sites. Interestingly, we found that there are difficulties in using the classical clonal model of tumor cell growth to explain the mutation patterns observed in some tumor cells. 相似文献
18.
Nancy B. Y. Tsui Peiyong Jiang Katherine C. K. Chow Xiaoxi Su Tak Y. Leung Hao Sun K. C. Allen Chan Rossa W. K. Chiu Y. M. Dennis Lo 《PloS one》2012,7(10)
Background
Fetal DNA in maternal urine, if present, would be a valuable source of fetal genetic material for noninvasive prenatal diagnosis. However, the existence of fetal DNA in maternal urine has remained controversial. The issue is due to the lack of appropriate technology to robustly detect the potentially highly degraded fetal DNA in maternal urine.Methodology
We have used massively parallel paired-end sequencing to investigate cell-free DNA molecules in maternal urine. Catheterized urine samples were collected from seven pregnant women during the third trimester of pregnancies. We detected fetal DNA by identifying sequenced reads that contained fetal-specific alleles of the single nucleotide polymorphisms. The sizes of individual urinary DNA fragments were deduced from the alignment positions of the paired reads. We measured the fractional fetal DNA concentration as well as the size distributions of fetal and maternal DNA in maternal urine.Principal Findings
Cell-free fetal DNA was detected in five of the seven maternal urine samples, with the fractional fetal DNA concentrations ranged from 1.92% to 4.73%. Fetal DNA became undetectable in maternal urine after delivery. The total urinary cell-free DNA molecules were less intact when compared with plasma DNA. Urinary fetal DNA fragments were very short, and the most dominant fetal sequences were between 29 bp and 45 bp in length.Conclusions
With the use of massively parallel sequencing, we have confirmed the existence of transrenal fetal DNA in maternal urine, and have shown that urinary fetal DNA was heavily degraded. 相似文献19.
A novel procedure was used for cloning large adenovirus genome fragment by the homologous recombination in E.coli strain BJ5183. The 11.2Kb downstream fragment of the CAV-2 strain YCA18 genome was cloned by homologous recombination, the 1029bp left end and the 970bp fight end of this fragment were separately amplified by PCR. They were then cloned into plasmid pPoly2 with direction from left fragment to fight fragment, obtaining a “rescue” plasmid pT615. The pT615 was liberalized by Hind Ⅲ and PstⅠ digestion and was cotransformed with the purified CAV-2 genome which was cut by BstBI into competent E.coli strain BJ5183. Recombinant plasmids harboring the 11.2Kb downstream fragment of CAV-2 genome were obtained after bacterial intermolecular homologous recombination. The recombinant efficiency of all E.coli strains tested was 78.3%. One of the recombinant plasmids, pT618, was further identified by enzyme digestion analysis and PCR amplification. The results showed the plasmids contained the 11.2kb fragment downstream the genome of CAV-2. 相似文献