共查询到20条相似文献,搜索用时 15 毫秒
1.
PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals 总被引:1,自引:0,他引:1
Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V Futschik A Kosiol C Schlötterer C 《PloS one》2011,6(1):e15925
Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θ(Watterson), θ(π), and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data. 相似文献
2.
Background
With an estimated 38 million people worldwide currently infected with human immunodeficiency virus (HIV), and an additional 4.1 million people becoming infected each year, it is important to understand how this virus mutates and develops resistance in order to design successful therapies.Methodology/Principal Findings
We report a novel experimental method for amplifying full-length HIV genomes without the use of sequence-specific primers for high throughput DNA sequencing, followed by assembly of full length viral genome sequences from the resulting large dataset. Illumina was chosen for sequencing due to its ability to provide greater coverage of the HIV genome compared to prior methods, allowing for more comprehensive characterization of the heterogeneity present in the HIV samples analyzed. Our novel amplification method in combination with Illumina sequencing was used to analyze two HIV populations: a homogenous HIV population based on the canonical NL4-3 strain and a heterogeneous viral population obtained from a HIV patient''s infected T cells. In addition, the resulting sequence was analyzed using a new computational approach to obtain a consensus sequence and several metrics of diversity.Significance
This study demonstrates how a lower bias amplification method in combination with next generation DNA sequencing provides in-depth, complete coverage of the HIV genome, enabling a stronger characterization of the quasispecies present in a clinically relevant HIV population as well as future study of how HIV mutates in response to a selective pressure. 相似文献3.
Emonet S Grard G Brisbarre N Moureau G Temmam S Charrel R de Lamballerie X 《Biochemical and biophysical research communications》2006,344(4):1080-1085
Here, we propose an optimised protocol (LoPPS, long PCR product sequencing) which allows the fast, cost-attractive, and high-throughput sequencing of long PCR products. LoPPS constitutes an alternative to the primer-walking technology which is expensive and time consuming but remains the current standard procedure. It is based on the ultrasonic shearing, polishing, and cloning of PCR or RT-PCR products and is compatible with 96- or 384-well microplate systems in which bacterial growth, preparation of plasmid DNA, and sequencing can be automated. We present results obtained from 24 different RT-PCR products (2.5-4.8 kbp long) obtained from various RNA viruses and fully sequenced using LoPPS. The method proved to be robust and fast. It was successfully used on a low amount of DNA and allowed each target nucleotide position to be controlled twice or more, with a final cost which is one-third of that of primer-walking. 相似文献
4.
Olivier Harismendy Pauline C Ng Robert L Strausberg Xiaoyun Wang Timothy B Stockwell Karen Y Beeson Nicholas J Schork Sarah S Murray Eric J Topol Samuel Levy Kelly A Frazer 《Genome biology》2009,10(3):R32-13
Background
Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.Results
Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.Conclusions
Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies. 相似文献5.
6.
Long Q Jeffares DC Zhang Q Ye K Nizhynska V Ning Z Tyler-Smith C Nordborg M 《PloS one》2011,6(1):e15292
With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/. 相似文献
7.
Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis. 相似文献
8.
9.
10.
11.
Kevin Stevenson 《Journal of biomolecular structure & dynamics》2020,38(12):3730-3735
Communicated by Ramaswamy H. Sarma 相似文献
12.
Sulonen AM Ellonen P Almusa H Lepistö M Eldfors S Hannula S Miettinen T Tyynismaa H Salo P Heckman C Joensuu H Raivio T Suomalainen A Saarela J 《Genome biology》2011,12(9):R94-18
Background
Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.Results
We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.Conclusions
Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons. 相似文献13.
Adult-onset neurodegenerative disorders are disabling and often fatal diseases of the nervous system whose underlying mechanisms of cell death remain unknown. Defects in mitochondrial respiration had previously been proposed to contribute to the occurrence of many, if not all, of the most common neurodegenerative disorders. However, the discovery of genes mutated in hereditary forms of these enigmatic diseases has additionally suggested defects in mitochondrial dynamics. Such disturbances can lead to changes in mitochondrial trafficking, in interorganellar communication, and in mitochondrial quality control. These new mechanisms by which mitochondria may also be linked to neurodegeneration will likely have far-reaching implications for our understanding of the pathophysiology and treatment of adult-onset neurodegenerative disorders. 相似文献
14.
Anthony H. Stobbe Jon Daniels Andres S. Espindola Ruchi Verma Ulrich Melcher Francisco Ochoa-Corona Carla Garzon Jacqueline Fletcher William Schneider 《Journal of microbiological methods》2013
Plant biosecurity requires rapid identification of pathogenic organisms. While there are many pathogen-specific diagnostic assays, the ability to test for large numbers of pathogens simultaneously is lacking. Next generation sequencing (NGS) allows one to detect all organisms within a given sample, but has computational limitations during assembly and similarity searching of sequence data which extend the time needed to make a diagnostic decision. To minimize the amount of bioinformatic processing time needed, unique pathogen-specific sequences (termed e-probes) were designed to be used in searches of unassembled, non-quality checked, sequence data. E-probes have been designed and tested for several selected phytopathogens, including an RNA virus, a DNA virus, bacteria, fungi, and an oomycete, illustrating the ability to detect several diverse plant pathogens. E-probes of 80 or more nucleotides in length provided satisfactory levels of precision (75%). The number of e-probes designed for each pathogen varied with the genome size of the pathogen. To give confidence to diagnostic calls, a statistical method of determining the presence of a given pathogen was developed, in which target e-probe signals (detection signal) are compared to signals generated by a decoy set of e-probes (background signal). The E-probe Diagnostic Nucleic acid Analysis (EDNA) process provides the framework for a new sequence-based detection system that eliminates the need for assembly of NGS data. 相似文献
15.
文章旨在建立一种基因组目标靶序列捕捉文库的方法,并结合第二代测序技术,以实现候选基因区段的深度测序。利用Agilent公司的eArray在线平台,对1250个基因的11824个外显子共2414977bp的基因组序列进行120个碱基长度的捕捉探针(钓饵)设计,并制备成SureSelect液相靶序列捕获试剂。选用2例人基因组DNA,超声打断后末端补平并磷酸化,连接SOLiD接头,回收150bp~200bp的DNA片段,与靶序列探针杂交捕获目标序列,油包水微乳滴PCR扩增后,磁珠分离富集,上SOLiD测序系统通过工作流程分析(WFA)进行文库质量的评价,或正式测序反应。结果显示对所包含的11147个基因外显子片段设计出并合成了46509个捕捉探针,制备成SureSelect试剂盒。探针可有效地捕捉并富集基因组DNA的目标靶片段,定量PCR显示富集效率可达29倍。WFA分析表明文库可以在SOLiD仪器进行正式测序。测序结果显示靶序列区域的测序数占有效总测序数的比例达到70%,覆盖率均在200×以上。结果表明本研究所建立的SureSelect基因组靶序列捕捉、富集建立测序文库的技术路线可行,可直接用于SOLiD测序仪的测序。 相似文献
16.
Two-stage clustering (TSC): a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons 总被引:1,自引:0,他引:1
Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC) because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/. 相似文献
17.
18.
19.
20.
The enrichment of targeted regions within complex next generation sequencing libraries commonly uses biotinylated baits to capture the desired sequences. This method results in high read coverage over the targets and their flanking regions. Oxford Nanopore Technologies recently released an USB3.0-interfaced sequencer, the MinION. To date no particular method for enriching MinION libraries has been standardized. Here, using biotinylated PCR-generated baits in a novel approach, we describe a simple and efficient way for multiplexed enrichment of MinION libraries, overcoming technical limitations related with the chemistry of the sequencing-adapters and the length of the DNA fragments. Using Phage Lambda and Escherichia coli as models we selectively enrich for specific targets, significantly increasing the corresponding read-coverage, eliminating unwanted regions. We show that by capturing genomic fragments, which contain the target sequences, we recover reads extending targeted regions and thus can be used for the determination of potentially unknown flanking sequences. By pooling enriched libraries derived from two distinct E. coli strains and analyzing them in parallel, we demonstrate the efficiency of this method in multiplexed format. Crucially we evaluated the optimal bait size for large fragment libraries and we describe for the first time a standardized method for target enrichment in MinION platform. 相似文献