首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Deep sequencing of viral populations using next-generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intrahost viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here, we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.  相似文献   

2.
Genetic analysis of hepatitis B virus (HBV) frequently involves study of intra-host variants, identification of which is commonly achieved using short regions of the HBV genome. However, the use of short sequences significantly limits evaluation of genetic relatedness among HBV strains. Although analysis of HBV complete genomes using genetic cloning has been developed, its application is highly labor intensive and practiced only infrequently. We describe here a novel approach to whole genome (WG) HBV quasispecies analysis based on end-point, limiting-dilution real-time PCR (EPLD-PCR) for amplification of single HBV genome variants, and their subsequent sequencing. EPLD-PCR was used to analyze WG quasispecies from serum samples of patients (n = 38) infected with HBV genotypes A, B, C, D, E and G. Phylogenetic analysis of the EPLD-isolated HBV-WG quasispecies showed the presence of mixed genotypes, recombinant variants and sub-populations of the virus. A critical observation was that HBV-WG consensus sequences obtained by direct sequencing of PCR fragments without EPLD are genetically close, but not always identical to the major HBV variants in the intra-host population, thus indicating that consensus sequences should be judiciously used in genetic analysis. Sequence-based studies of HBV WG quasispecies should afford a more accurate assessment of HBV evolution in various clinical and epidemiological settings.  相似文献   

3.
Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ∼8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruses.  相似文献   

4.
Given the low intraspecific chloroplast diversity detected in northern red oak (Quercus rubra L.), more powerful genetic tools are necessary to accurately characterize Q. rubra chloroplast diversity and structure. We report the sequencing, assembly, and annotation of the chloroplast genome of northern red oak via pyrosequencing and a combination of de novo and reference-guided assembly (RGA). Chloroplast DNA from 16 individuals was separated into four MID-tagged pools for a Genome Sequencer 20 quarter-run (Roche Life Sciences, Indianapolis, IN, USA). A four-step assembly method was used to generate the Q. rubra chloroplast consensus sequence: (1) reads were assembled de novo into contigs, (2) de novo contigs were aligned to a reference genome and merged to produce a consensus sequence, (3) the consensus sequence was aligned to the reference sequence and gaps between contigs were filled with reference sequence to generate a "pseudoreference", and (4) reads were mapped to the pseudoreference using RGA to generate the draft chloroplast genome. One hundred percent of the pseudoreference sequence was covered with a minimum coverage of 2× and an average coverage of 43.75×. The 161,304-bp Q. rubra chloroplast genome draft sequence contained 137 genes and one rps19 pseudogene. The sequence was compared to that of Quercus robur and Q. nigra with 951 and 186 insertion/deletion or SNP polymorphisms detected, respectively. A total of 51 intraspecific polymorphisms were detected among four northern red oak individuals. The fully sequenced and annotated Q. rubra chloroplast genome containing locations of interspecific and intraspecific polymorphisms will be essential for studying population differentiation, phylogeography, and evolutionary history of this species as well as meeting management goals such as monitoring reintroduced populations, tracking wood products, and certifying seed lots and forests.  相似文献   

5.
The HCV genome exhibits significant intra-host genetic heterogeneity as the result of accumulation of mutations during viral replication. At each point in time during the infection, the viral population is composed of a dominant master sequence and a number of sequences diverging from the master sequence to various extents (the viral quasi-species). The quasispecies is a complex, dynamic distribution of nonidentical, but related, replicons. In these populations, viral variants may undergo very large changes in their fitness (the replicative adaptability of an organism to its environment), including dramatic fitness loss and important fitness gains. The biological impact of this event may theoretically include modifications of tropism, appearance of escape mutants, changes in pathogenic potential, and resistance to antiviral agents. A growing body of molecular and clinical data currently suggests that both inter- and intra-host genetic heterogeneity of HCV have crucial biological and medical implications, influencing not only infection prevention, but also clinical progression of chronic liver disease in persistently infected subjects, HCV infection of non-liver cells, and response to the anti-viral therapy.  相似文献   

6.
Massively parallel sequencing (MPS) technologies, such as 454-pyrosequencing, allow for the identification of variants in sequence populations at lower levels than consensus sequencing and most single-template Sanger sequencing experiments. We sought to determine if the greater depth of population sampling attainable using MPS technology would allow detection of minor variants in HIV founder virus populations very early in infection in instances where Sanger sequencing detects only a single variant. We compared single nucleotide polymorphisms (SNPs) during acute HIV-1 infection from 32 subjects using both single template Sanger and 454-pyrosequencing. Pyrosequences from a median of 2400 viral templates per subject and encompassing 40% of the HIV-1 genome, were compared to a median of five individually amplified near full-length viral genomes sequenced using Sanger technology. There was no difference in the consensus nucleotide sequences over the 3.6kb compared in 84% of the subjects infected with single founders and 33% of subjects infected with multiple founder variants: among the subjects with disagreements, mismatches were found in less than 1% of the sites evaluated (of a total of nearly 117,000 sites across all subjects). The majority of the SNPs observed only in pyrosequences were present at less than 2% of the subject’s viral sequence population. These results demonstrate the utility of the Sanger approach for study of early HIV infection and provide guidance regarding the design, utility and limitations of population sequencing from variable template sources, and emphasize parameters for improving the interpretation of massively parallel sequencing data to address important questions regarding target sequence evolution.  相似文献   

7.

Background

High genetic diversity at both inter- and intra-host level are hallmarks of RNA viruses due to the error-prone nature of their genome replication. Several groups have evaluated the extent of viral variability using different RNA virus deep sequencing methods. Although much of this effort has been dedicated to pathogens that cause chronic infections in humans, few studies investigated arthropod-borne, acute viral infections.

Methods and Principal Findings

We deep sequenced the complete genome of ten DENV2 isolates from representative classical and severe cases sampled in a large outbreak in Brazil using two different approaches. Analysis of the consensus genomes confirmed the larger extent of the 2010 epidemic in comparison to a previous epidemic caused by the same viruses in another city two years before (genetic distance = 0.002 and 0.0008 respectively). Analysis of viral populations within the host revealed a high level of conservation. After excluding homopolymer regions of 454/Roche generated sequences, we found 10 to 44 variable sites per genome population at a frequency of >1%, resulting in very low intra-host genetic diversity. While up to 60% of all variable sites at intra-host level were non-synonymous changes, only 10% of inter-host variability resulted from non-synonymous mutations, indicative of purifying selection at the population level.

Conclusions and Significance

Despite the error-prone nature of RNA-dependent RNA-polymerase, dengue viruses maintain low levels of intra-host variability.  相似文献   

8.
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.  相似文献   

9.
Next-generation sequencing (NGS) has revolutionized genetics and enabled the accurate identification of many genetic variants across many genomes. However, detection of biologically important low-frequency variants within genetically heterogeneous populations remains challenging, because they are difficult to distinguish from intrinsic NGS sequencing error rates. Approaches to overcome these limitations are essential to detect rare mutations in large cohorts, virus or microbial populations, mitochondria heteroplasmy, and other heterogeneous mixtures such as tumors. Modifications in library preparation can overcome some of these limitations, but are experimentally challenging and restricted to skilled biologists. This paper describes a novel quality filtering and base pruning pipeline, called Complex Heterogeneous Overlapped Paired-End Reads (CHOPER), designed to detect sequence variants in a complex population with high sequence similarity derived from All-Codon-Scanning (ACS) mutagenesis. A novel fast alignment algorithm, designed for the specified application, has O(n) time complexity. CHOPER was applied to a p53 cancer mutant reactivation study derived from ACS mutagenesis. Relative to error filtering based on Phred quality scores, CHOPER improved accuracy by about 13% while discarding only half as many bases. These results are a step toward extending the power of NGS to the analysis of genetically heterogeneous populations.  相似文献   

10.
Viruses of the Bacteria and Archaea play important roles in microbial evolution and ecology, and yet viral dynamics in natural systems remain poorly understood. Here, we created de novo assemblies from 6.4 Gbp of metagenomic sequence from eight community viral concentrate samples, collected from 12 h to 3 years apart from hypersaline Lake Tyrrell (LT), Victoria, Australia. Through extensive manual assembly curation, we reconstructed 7 complete and 28 partial novel genomes of viruses and virus-like entities (VLEs, which could be viruses or plasmids). We tracked these 35 populations across the eight samples and found that they are generally stable on the timescale of days and transient on the timescale of years, with some exceptions. Cross-detection of the 35 LT populations in three previously described haloviral metagenomes was limited to a few genes, and most previously sequenced haloviruses were not detected in our samples, though 3 were detected upon reducing our detection threshold from 90% to 75% nucleotide identity. Similar results were obtained when we applied our methods to haloviral metagenomic data previously reported from San Diego, CA: 10 contigs that we assembled from that system exhibited a variety of detection patterns on a timescale of weeks to 1 month but were generally not detected in LT. Our results suggest that most haloviral populations have a limited or, possibly, a temporally variable global distribution. This study provides high-resolution insight into viral biogeography and dynamics and it places "snapshot" viral metagenomes, collected at a single time and location, in context.  相似文献   

11.
Herpes simplex virus type 1 and 2 (HSV-1 and HSV-2, respectively) are prevalent human pathogens of clinical relevance that establish long-life latency in the nervous system. They have been considered, along with the Herpesviridae family, to exhibit a low level of genetic diversity during viral replication. However, the high ability shown by these viruses to rapidly evolve under different selective pressures does not correlates with that presumed genetic stability. High-throughput sequencing has revealed that heterogeneous or plaque-purified populations of both serotypes contain a broad range of genetic diversity, in terms of number and frequency of minor genetic variants, both in vivo and in vitro. This is reminiscent of the quasispecies phenomenon traditionally associated with RNA viruses. Here, by plaque-purification of two selected viral clones of each viral subtype, we reduced the high level of genetic variability found in the original viral stocks, to more genetically homogeneous populations. After having deeply characterized the genetic diversity present in the purified viral clones as a high confidence baseline, we examined the generation of de novo genetic diversity under culture conditions. We found that both serotypes gradually increased the number of de novo minor variants, as well as their frequency, in two different cell types after just five and ten passages. Remarkably, HSV-2 populations displayed a much higher raise of nonconservative de novo minor variants than the HSV-1 counterparts. Most of these minor variants exhibited a very low frequency in the population, increasing their frequency over sequential passages. These new appeared minor variants largely impacted the coding diversity of HSV-2, and we found some genes more prone to harbor higher variability. These data show that herpesviruses generate de novo genetic diversity differentially under equal in vitro culture conditions. This might have contributed to the evolutionary divergence of HSV-1 and HSV-2 adapting to different anatomical niche, boosted by selective pressures found at each epithelial and neuronal tissue.  相似文献   

12.

Background

With an estimated 38 million people worldwide currently infected with human immunodeficiency virus (HIV), and an additional 4.1 million people becoming infected each year, it is important to understand how this virus mutates and develops resistance in order to design successful therapies.

Methodology/Principal Findings

We report a novel experimental method for amplifying full-length HIV genomes without the use of sequence-specific primers for high throughput DNA sequencing, followed by assembly of full length viral genome sequences from the resulting large dataset. Illumina was chosen for sequencing due to its ability to provide greater coverage of the HIV genome compared to prior methods, allowing for more comprehensive characterization of the heterogeneity present in the HIV samples analyzed. Our novel amplification method in combination with Illumina sequencing was used to analyze two HIV populations: a homogenous HIV population based on the canonical NL4-3 strain and a heterogeneous viral population obtained from a HIV patient''s infected T cells. In addition, the resulting sequence was analyzed using a new computational approach to obtain a consensus sequence and several metrics of diversity.

Significance

This study demonstrates how a lower bias amplification method in combination with next generation DNA sequencing provides in-depth, complete coverage of the HIV genome, enabling a stronger characterization of the quasispecies present in a clinically relevant HIV population as well as future study of how HIV mutates in response to a selective pressure.  相似文献   

13.
Swine vesicular disease virus (SVDV) is an enterovirus that is both genetically and antigenically closely related to human coxsackievirus B5 within the Picornaviridae family. SVDV is the causative agent of a highly contagious (though rarely fatal) vesicular disease in pigs. We report a rapid method that is suitable for sequencing the complete protein-encoding sequences of SVDV isolates in which the RNA is relatively intact. The approach couples a single PCR amplification reaction, using only a single PCR primer set to amplify the near-complete SVDV genome, with deep-sequencing using a small fraction of the capacity of a Roche GS FLX sequencing platform. Sequences were initially verified through one of two criteria; either a match between a de novo assembly and a reference mapping, or a match between all of five different reference mappings performed against a fixed set of starting reference genomes with significant genetic distances within the same species of viruses. All reference mappings used an iterative method to avoid bias. Further verification was achieved through phylogenetic analysis against published SVDV genomes and additional Enterovirus B sequences. This approach allows high confidence in the obtained consensus sequences, as well as provides sufficiently high and evenly dispersed sequence coverage to allow future studies of intra-host variation.  相似文献   

14.
15.
16.
Rapidly evolving viruses are a major threat to human health. Such viruses are often highly pathogenic (e.g., influenza virus, HIV, Ebola virus) and routinely circumvent therapeutic intervention through mutational escape. Error-prone genome replication generates heterogeneous viral populations that rapidly adapt to new selection pressures, leading to resistance that emerges with treatment. However, population heterogeneity bears a cost: when multiple viral variants replicate within a cell, they can potentially interfere with each other, lowering viral fitness. This genetic interference can be exploited for antiviral strategies, either by taking advantage of a virus’s inherent genetic diversity or through generating de novo interference by engineering a competing genome. Here, we discuss two such antiviral strategies, dominant drug targeting and therapeutic interfering particles. Both strategies harness the power of genetic interference to surmount two particularly vexing obstacles—the evolution of drug resistance and targeting therapy to high-risk populations—both of which impede treatment in resource-poor settings.  相似文献   

17.
Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.  相似文献   

18.
We review evidence that cloned (or uncloned) populations of most RNA viruses do not consist of a single genome species of defined sequence, but rather of heterogeneous mixtures of related genomes (quasispecies). Due to very high mutation rates, genomes of a quasispecies virus population share a consensus sequence but differ from each other and from the consensus sequence by one, several, or many mutations. Viral genome analyses by sequencing, fingerprinting, cDNA cloning etc. indicate that most viral RNA populations (quasispecies) contain all possible single and double genomic site mutations and varying proportions of triple, quadruple, etc. site mutations. This quasispecies structure of RNA virus populations has many important theoretical and practical implications because mutations at only one or a few sites may alter the phenotype of an RNA virus.  相似文献   

19.
Dengue virus (DENV) infection of an individual human or mosquito host produces a dynamic population of closely-related sequences. This intra-host genetic diversity is thought to offer an advantage for arboviruses to adapt as they cycle between two very different host species, but it remains poorly characterized. To track changes in viral intra-host genetic diversity during horizontal transmission, we infected Aedes aegypti mosquitoes by allowing them to feed on DENV2-infected patients. We then performed whole-genome deep-sequencing of human- and matched mosquito-derived DENV samples on the Illumina platform and used a sensitive variant-caller to detect single nucleotide variants (SNVs) within each sample. >90% of SNVs were lost upon transition from human to mosquito, as well as from mosquito abdomen to salivary glands. Levels of viral diversity were maintained, however, by the regeneration of new SNVs at each stage of transmission. We further show that SNVs maintained across transmission stages were transmitted as a unit of two at maximum, suggesting the presence of numerous variant genomes carrying only one or two SNVs each. We also present evidence for differences in selection pressures between human and mosquito hosts, particularly on the structural and NS1 genes. This analysis provides insights into how population drops during transmission shape RNA virus genetic diversity, has direct implications for virus evolution, and illustrates the value of high-coverage, whole-genome next-generation sequencing for understanding viral intra-host genetic diversity.  相似文献   

20.
Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA — including poly(rA) carrier and ribosomal RNA — from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based ''tagmentation'' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号