首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline.  相似文献   

2.
Acquisition of genetic material from viruses by their hosts can generate inter-host structural genome variation. We developed computational tools enabling us to study virus-derived structural variants (SVs) in population-scale whole genome sequencing (WGS) datasets and applied them to 3,332 humans. Although SVs had already been cataloged in these subjects, we found previously-overlooked virus-derived SVs. We detected non-germline SVs derived from squirrel monkey retrovirus (SMRV), human immunodeficiency virus 1 (HIV-1), and human T lymphotropic virus (HTLV-1); these variants are attributable to infection of the sequenced lymphoblastoid cell lines (LCLs) or their progenitor cells and may impact gene expression results and the biosafety of experiments using these cells. In addition, we detected new heritable SVs derived from human herpesvirus 6 (HHV-6) and human endogenous retrovirus-K (HERV-K). We report the first solo-direct repeat (DR) HHV-6 likely to reflect DR rearrangement of a known full-length endogenous HHV-6. We used linkage disequilibrium between single nucleotide variants (SNVs) and variants in reads that align to HERV-K, which often cannot be mapped uniquely using conventional short-read sequencing analysis methods, to locate previously-unknown polymorphic HERV-K loci. Some of these loci are tightly linked to trait-associated SNVs, some are in complex genome regions inaccessible by prior methods, and some contain novel HERV-K haplotypes likely derived from gene conversion from an unknown source or introgression. These tools and results broaden our perspective on the coevolution between viruses and humans, including ongoing virus-to-human gene transfer contributing to genetic variation between humans.  相似文献   

3.
《Genomics》2020,112(2):1872-1878
Whole genome sequencing (WGS) is a widely available, inexpensive means of providing a wealth of information about an organism's diversity and evolution. However, WGS for many pathogenic bacteria remain limited because they are difficult, slow and/or dangerous to culture. To avoid culturing, metagenomic sequencing can be performed directly on samples, but the sequencing effort required to characterize low frequency organisms can be expensive. Recently developed methods for selective whole genome amplification (SWGA) can enrich target DNA to provide efficient sequencing. We amplified Coxiella burnetii (a bacterial select agent and human/livestock pathogen) from 3 three environmental samples that were overwhelmed with host DNA. The 68- to 147-fold enrichment of the bacterial sequences provided enough genome coverage for SNP analyses and phylogenetic placement. SWGA is a valuable tool for the study of difficult-to-culture organisms and has the potential to facilitate high-throughput population characterizations as well as targeted epidemiological or forensic investigations.  相似文献   

4.
5.
6.
It is commonly accepted that there are many unknown viruses on the planet. For the known viruses, do we know their prevalence, even in our experimental systems? Here we report a virus survey using recently published small (s)RNA sequencing datasets. The sRNA reads were assembled and contigs were screened for virus homologues against the NCBI nucleotide (nt) database using the BLASTn program. To our surprise, approximately 30% (28 out of 94) of publications had highly scored viral sequences in their datasets. Among them, only two publications reported virus infections. Though viral vectors were used in some of the publications, virus sequences without any identifiable source appeared in more than 20 publications. By determining the distributions of viral reads and the antiviral RNA interference (RNAi) pathways using the sRNA profiles, we showed evidence that many of the viruses identified were indeed infecting and generated host RNAi responses. As virus infections affect many aspects of host molecular biology and metabolism, the presence and impact of viruses needs to be actively investigated in experimental systems.  相似文献   

7.
Bovine enteroviruses as indicators of fecal contamination   总被引:2,自引:0,他引:2  
Surface waters frequently have been contaminated with human enteric viruses, and it is likely that animal enteric viruses have contaminated surface waters also. Bovine enteroviruses (BEV), found in cattle worldwide, usually cause asymptomatic infections and are excreted in the feces of infected animals in large numbers. In this study, the prevalence and genotype of BEV in a closed herd of cattle were evaluated and compared with BEV found in animals in the immediate environment and in environmental specimens. BEV was found in feces from 76% of cattle, 38% of white-tailed deer, and one of three Canada geese sharing the same pastures, as well as the water obtained from animal watering tanks, from the pasture, from streams running from the pasture to an adjacent river, and from the river, which emptied into the Chesapeake Bay. Furthermore, BEV was found in oysters collected from that river downstream from the farm. These findings suggest that BEV could be used as an indicator of fecal pollution originating from animals (cattle and/or deer). Partial sequence analysis of the viral genomes indicates that different viral variants coexist in the same area. The possibility of identifying the viral strains found in the animals and in the contaminated areas by sequencing the RNA genome, could provide a tool to find the origin of the contamination and should be useful for epidemiological and viral molecular evolution studies.  相似文献   

8.
Bovine Enteroviruses as Indicators of Fecal Contamination   总被引:6,自引:6,他引:0       下载免费PDF全文
Surface waters frequently have been contaminated with human enteric viruses, and it is likely that animal enteric viruses have contaminated surface waters also. Bovine enteroviruses (BEV), found in cattle worldwide, usually cause asymptomatic infections and are excreted in the feces of infected animals in large numbers. In this study, the prevalence and genotype of BEV in a closed herd of cattle were evaluated and compared with BEV found in animals in the immediate environment and in environmental specimens. BEV was found in feces from 76% of cattle, 38% of white-tailed deer, and one of three Canada geese sharing the same pastures, as well as the water obtained from animal watering tanks, from the pasture, from streams running from the pasture to an adjacent river, and from the river, which emptied into the Chesapeake Bay. Furthermore, BEV was found in oysters collected from that river downstream from the farm. These findings suggest that BEV could be used as an indicator of fecal pollution originating from animals (cattle and/or deer). Partial sequence analysis of the viral genomes indicates that different viral variants coexist in the same area. The possibility of identifying the viral strains found in the animals and in the contaminated areas by sequencing the RNA genome, could provide a tool to find the origin of the contamination and should be useful for epidemiological and viral molecular evolution studies.  相似文献   

9.
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.  相似文献   

10.
Molecular identification of mixed‐species pollen samples has a range of applications in various fields of research. To date, such molecular identification has primarily been carried out via amplicon sequencing, but whole‐genome shotgun (WGS) sequencing of pollen DNA has potential advantages, including (1) more genetic information per sample and (2) the potential for better quantitative matching. In this study, we tested the performance of WGS sequencing methodology and publicly available reference sequences in identifying species and quantifying their relative abundance in pollen mock communities. Using mock communities previously analyzed with DNA metabarcoding, we sequenced approximately 200Mbp for each sample using Illumina HiSeq and MiSeq. Taxonomic identifications were based on the Kraken k‐mer identification method with reference libraries constructed from full‐genome and short read archive data from the NCBI database. We found WGS to be a reliable method for taxonomic identification of pollen with near 100% identification of species in mixtures but generating higher rates of false positives (reads not identified to the correct taxon at the required taxonomic level) relative to rbcL and ITS2 amplicon sequencing. For quantification of relative species abundance, WGS data provided a stronger correlation between pollen grain proportion and sequence read proportion, but diverged more from a 1:1 relationship, likely due to the higher rate of false positives. Currently, a limitation of WGS‐based pollen identification is the lack of representation of plant diversity in publicly available genome databases. As databases improve and costs drop, we expect that eventually genomics methods will become the methods of choice for species identification and quantification of mixed‐species pollen samples.  相似文献   

11.
Discovery of new viruses has been boosted by novel deep sequencing technologies. Currently, many viruses can be identified by sequencing without knowledge of the pathogenicity of the virus. However, attributing the presence of a virus in patient material to a disease in the patient can be a challenge. One approach to meet this challenge is identification of viral sequences based on enrichment by autologous patient antibody capture. This method facilitates identification of viruses that have provoked an immune response within the patient and may increase the sensitivity of the current virus discovery techniques. To demonstrate the utility of this method, virus discovery deep sequencing (VIDISCA-454) was performed on clinical samples from 19 patients: 13 with a known respiratory viral infection and 6 with a known gastrointestinal viral infection. Patient sera was collected from one to several months after the acute infection phase. Input and antibody capture material was sequenced and enrichment was assessed. In 18 of the 19 patients, viral reads from immunogenic viruses were enriched by antibody capture (ranging between 1.5x to 343x in respiratory material, and 1.4x to 53x in stool). Enriched reads were also determined in an identity independent manner by using a novel algorithm Xcompare. In 16 of the 19 patients, 21% to 100% of the enriched reads were derived from infecting viruses. In conclusion, the technique provides a novel approach to specifically identify immunogenic viral sequences among the bulk of sequences which are usually encountered during virus discovery metagenomics.  相似文献   

12.
《Genomics》2023,115(2):110556
As the most readily adopted molecular screening test, low-pass WGS of maternal plasma cell-free DNA for aneuploidy detection generates a vast amount of genomic data. This large-scale method also allows for high-throughput virome screening. NIPT sequencing data, yielding 6.57 terabases of data from 187.8 billion reads, from 12,951 pregnant Turkish women was used to investigate the prevalence and abundance of viral DNA in plasma. Among the 22 virus sequences identified in 12% of participants were human papillomavirus, herpesvirus, betaherpesvirus and anellovirus. We observed a unique pattern of circulating viral DNA with a high prevalence of papillomaviruses. The prevalence of herpesviruses/anellovirus was similar among Turkish, European and Dutch populations. Hepatitis B prevalence was remarkably low in Dutch, European and Turkish populations, but higher in China. WGS data revealed that herpesvirus/anelloviruses are naturally found in European populations. This represents the first comprehensive research on the plasma virome of pregnant Turkish women.  相似文献   

13.
The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a "read." Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of "overlaps," i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the "UMD Overlapper," can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera's Drosophila reads. When we replaced Celera's overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.  相似文献   

14.

Background

Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.

Results

We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.

Conclusions

Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data.  相似文献   

15.
New DNA viruses identified in patients with acute viral infection syndrome   总被引:11,自引:0,他引:11  
A sequence-independent PCR amplification method was used to identify viral nucleic acids in the plasma samples of 25 individuals presenting with symptoms of acute viral infection following high-risk behavior for human immunodeficiency virus type 1 transmission. GB virus C/hepatitis G virus was identified in three individuals and hepatitis B virus in one individual. Three previously undescribed DNA viruses were also detected, a parvovirus and two viruses related to TT virus (TTV). Nucleic acids in human plasma that were distantly related to bacterial sequences or with no detectable similarities to known sequences were also found. Nearly complete viral genome sequencing and phylogenetic analysis confirmed the presence of a new parvovirus distinct from known human and animal parvoviruses and of two related TTV-like viruses highly divergent from both the TTV and TTV-like minivirus groups. The detection of two previously undescribed viral species in a small group of individuals presenting acute viral syndrome with unknown etiology indicates that a rich yield of new human viruses may be readily identifiable using simple methods of sequence-independent nucleic acid amplification and limited sequencing.  相似文献   

16.
Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.  相似文献   

17.
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.  相似文献   

18.

Background

Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.

Methodology/Principal Findings

For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.

Conclusions/Significance

Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.  相似文献   

19.
The genome sequence of silkworm, Bombyx mori.   总被引:21,自引:0,他引:21  
We performed threefold shotgun sequencing of the silkworm (Bombyx mori) genome to obtain a draft sequence and establish a basic resource for comprehensive genome analysis. By using the newly developed RAMEN assembler, the sequence data derived from whole-genome shotgun (WGS) sequencing were assembled into 49,345 scaffolds that span a total length of 514 Mb including gaps and 387 Mb without gaps. Because the genome size of the silkworm is estimated to be 530 Mb, almost 97% of the genome has been organized in scaffolds, of which 75% has been sequenced. By carrying out a BLAST search for 50 characteristic Bombyx genes and 11,202 non-redundant expressed sequence tags (ESTs) in a Bombyx EST database against the WGS sequence data, we evaluated the validity of the sequence for elucidating the majority of silkworm genes. Analysis of the WGS data revealed that the silkworm genome contains many repetitive sequences with an average length of <500 bp. These repetitive sequences appear to have been derived from truncated transposons, which are interspersed at 2.5- to 3-kb intervals throughout the genome. This pattern suggests that silkworm may have an active mechanism that promotes removal of transposons from the genome. We also found evidence for insertions of mitochondrial DNA fragments at 9 sites. A search for Bombyx orthologs to Drosophila genes controlling sex determination in the WGS data revealed 11 Bombyx genes and suggested that the sex-determining systems differ profoundly between the two species.  相似文献   

20.

Background

Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure.

Results

The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus.

Conclusion

The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the method and may be used for further method optimization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号