首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Virus taxonomy has received little attention from the research community despite its broad relevance. In an accompanying paper (C. Lauber and A. E. Gorbalenya, J. Virol. 86:3890-3904, 2012), we have introduced a quantitative approach to hierarchically classify viruses of a family using pairwise evolutionary distances (PEDs) as a measure of genetic divergence. When applied to the six most conserved proteins of the Picornaviridae, it clustered 1,234 genome sequences in groups at three hierarchical levels (to which we refer as the "GENETIC classification"). In this study, we compare the GENETIC classification with the expert-based picornavirus taxonomy and outline differences in the underlying frameworks regarding the relation of virus groups and genetic diversity that represent, respectively, the structure and content of a classification. To facilitate the analysis, we introduce two novel diagrams. The first connects the genetic diversity of taxa to both the PED distribution and the phylogeny of picornaviruses. The second depicts a classification and the accommodated genetic diversity in a standardized manner. Generally, we found striking agreement between the two classifications on species and genus taxa. A few disagreements concern the species Human rhinovirus A and Human rhinovirus C and the genus Aphthovirus, which were split in the GENETIC classification. Furthermore, we propose a new supergenus level and universal, level-specific PED thresholds, not reached yet by many taxa. Since the species threshold is approached mostly by taxa with large sampling sizes and those infecting multiple hosts, it may represent an upper limit on divergence, beyond which homologous recombination in the six most conserved genes between two picornaviruses might not give viable progeny.  相似文献   

2.
《Genomics》2022,114(4):110414
Classification of viruses into their taxonomic ranks (e.g., order, family, and genus) provides a framework to organize an abundant population of viruses. Next-generation metagenomic sequencing technologies lead to a rapid increase in generating sequencing data of viruses which require bioinformatics tools to analyze the taxonomy. Many metagenomic taxonomy classifiers have been developed to study microbiomes, but it is particularly challenging to assign the taxonomy of diverse virus sequences and there is a growing need for dedicated methods to be developed that are optimized to classify virus sequences into their taxa. For taxonomic classification of viruses from metagenomic sequences, we developed VirusTaxo using diverse (e.g., 402 DNA and 280 RNA) genera of viruses. VirusTaxo has an average accuracy of 93% at genus level prediction in DNA and RNA viruses. VirusTaxo outperformed existing taxonomic classifiers of viruses where it assigned taxonomy of a larger fraction of metagenomic contigs compared to other methods. Benchmarking of VirusTaxo on a collection of SARS-CoV-2 sequencing libraries and metavirome datasets suggests that VirusTaxo can characterize virus taxonomy from highly diverse contigs and provide a reliable decision on the taxonomy of viruses.  相似文献   

3.
Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence-because they share a most recent common ancestor-when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection.  相似文献   

4.
Restriction endonucleases which cleave DNA at specific nucleotide sequences can be used to produce a set of DNA fragments of a viral genome which, when separated by gel electrophoresis, gives a characteristic "fingerprint" for that virus genome. This simple technique has been used to identify and classify DNA viruses of the herpes, adeno, and papova virus groups. Small variants within a given type (e.g., herpes simplex type I) are genetically stable and permit study and identification of individual strains of viruses. Such analyses have recently been applied to study the epidemiology of some DNA virus outbreaks. Restriction endonuclease fingerprinting provides a useful addition to methods for virus identification and classification.  相似文献   

5.

Background  

The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes.  相似文献   

6.
B19 virus is a human virus belonging to the genus Erythrovirus: The genetic diversity among B19 virus isolates has been reported to be very low, with less than 2% nucleotide divergence in the whole genome sequence. We have previously reported the isolation of a human erythrovirus isolate, termed V9, whose sequence was markedly distinct (>11% nucleotide divergence) from that of B19 virus. To date, the V9 isolate remains the unique representative of a new variant in the genus Erythrovirus, and its taxonomic position is unclear. We report here the isolation of 11 V9-related viruses. A prospective study conducted in France between 1999 and 2001 indicates that V9-related viruses actually circulate at a significant frequency (11.4%) along with B19 viruses. Analysis of the nearly full-length genome sequence of one V9-related isolate (D91.1) indicates that the D91.1 sequence clusters together with but is notably distant from the V9 sequence (5.3% divergence) and is distantly related to B19 virus sequences (13.8 to 14.2% divergence). Additional phylogenetic analysis of partial sequences from the V9-related isolates combined with erythrovirus sequences available in GenBank indicates that the erythrovirus group is more diverse than thought previously and can be divided into three well-individualized genotypes, with B19 viruses corresponding to genotype 1 and V9-related viruses being distributed into genotypes 2 and 3.  相似文献   

7.
MOTIVATION: The evolution of viruses is very rapid and in addition to local point mutations (insertion, deletion, substitution) it also includes frequent recombinations, genome rearrangements and horizontal transfer of genetic materials (HGTS). Evolutionary analysis of viral sequences is therefore a complicated matter for two main reasons: First, due to HGTs and recombinations, the right model of evolution is a network and not a tree. Second, due to genome rearrangements, an alignment of the input sequences is not guaranteed. These facts encourage developing methods for inferring phylogenetic networks that do not require aligned sequences as input. RESULTS: In this work, we present the first computational approach which deals with both genome rearrangements and horizontal gene transfers and does not require a multiple alignment as input. We formalize a new set of computational problems which involve analyzing such complex models of evolution. We investigate their computational complexity, and devise algorithms for solving them. Moreover, we demonstrate the viability of our methods on several synthetic datasets as well as four biological datasets. AVAILABILITY: The code is available from the authors upon request.  相似文献   

8.
The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).  相似文献   

9.
Deng M  Yu C  Liang Q  He RL  Yau SS 《PloS one》2011,6(3):e17293

Background

Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences.

Methodology

To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists'' analyses.

Conclusions

Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve.  相似文献   

10.
The potential threat of another influenza virus pandemic stimulates discussion on how to prepare for such an event. The most reasonable prophylactic approach appears to be the use of effective vaccines. Since influenza and other negative-stranded RNA viruses are amenable to genetic manipulation using transfection by plasmids, it is possible to outline new reverse genetics-based approaches for vaccination against influenza viruses. We suggest three approaches. First, we use a plasmid-only rescue system that allows the rapid generation of high-yield recombinant vaccine strains. Second, we propose developing second-generation live influenza virus vaccines by constructing an attenuated master strain with deletions in the NS1 protein, which acts as an interferon antagonist. Third, we suggest the use of Newcastle disease virus recombinants expressing influenza virus haemagglutinin proteins of pandemic (epizootic) strains as novel vaccine vectors for use in animals and possibly humans.  相似文献   

11.
The genome of the Friend strain of the spleen focus-forming virus (SFFV) has been analyzed by molecular hybridization. SFFV is composed of genetic sequences homologous to Friend type C helper virus (F-MuLV) and SFFV-specific sequences not present in F-MuLV. These SFFV-specific sequences are present in both the Friend and Rauscher strains of murine erythroleukemia virus. The SFFV-specific sequences are partially homologous to three separate strains of mouse xenotropic virus but not to several cloned mouse ecotropic viruses. Thus, the Friend strain of SFFV appears to be a recombinant between a portion of the F-MuLV genome and RNA sequences that are highly related to murine xenotropic viruses. The implications of the acquisition of the xenotropic virus-related sequences are discussed in relation to the leukemogenicity of SFFV, and a model for the pathogenicity of other murine leukemia-inducing viruses is proposed.  相似文献   

12.
A new sequence distance measure for phylogenetic tree construction   总被引:5,自引:0,他引:5  
MOTIVATION: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. RESULTS: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets. AVAILABILITY: Available on request from the authors.  相似文献   

13.
Barley yellow dwarf virus (BYDV) species PAV occurs frequently in irrigated wheat fields worldwide and can be efficiently transmitted by aphids. Isolates of BYDV-PAV from different countries show great divergence both in genomic sequences and pathogenicity. Despite its economical importance, the genetic structure of natural BYDV-PAV populations, as well as of the mechanisms maintaining its high diversity, remain poorly explored. In this study, we investigate the dynamics of BYDV-PAV genome evolution utilizing time-structured data sets of complete genomic sequences from 58 isolates from different hosts obtained worldwide. First, we observed that BYDV-PAV exhibits a high frequency of homologous recombination. Second, our analysis revealed that BYDV-PAV genome evolves under purifying selection and at a substitution rate similar to other RNA viruses (3.158×10(-4) nucleotide substitutions/site/year). Phylogeography analyses show that the diversification of BYDV-PAV can be explained by local geographic adaptation as well as by host-driven adaptation. These results increase our understanding of the diversity, molecular evolutionary characteristics and epidemiological properties of an economically important plant RNA virus.  相似文献   

14.
Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions.  相似文献   

15.
Hai ming Ni  Da wei Qi  Hongbo Mu 《Genomics》2018,110(3):180-190
Converting DNA sequence to image by using chaos game representation (CGR) is an effective genome sequence pretreatment technology, which provides the basis for further analysis between the different genes. In this paper, we have constructed 10 mammal species, 48 hepatitis E virus (HEV), and 10 kinds of bacteria genetic CGR images, respectively, to calculate the mean structural similarity (MSSIM) coefficient between every two CGR images. From our analysis, the MSSIM coefficient of gene CGR images can accurately reflect the similarity degrees between different genomes. Hierarchical clustering analysis was used to calculate the class affiliation and construct a dendrogram. Large numbers of experiments showed that this method gives comparable results to the traditional Clustal X phylogenetic tree construction method, and is significantly faster in the clustering analysis process. Meanwhile MSSIM combined CGR method was also able to efficiently clustering of large genome sequences, which the traditional multiple sequence alignment methods (e.g. Clustal X, Clustal Omega, Clustal W, et al.) cannot classify.  相似文献   

16.
Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as 'Lagos Bat'. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses.  相似文献   

17.
A Cameroonian patient with antibodies reacting simultaneously to human immunodeficiency virus type 1 (HIV-1) group O- and group M-specific V3-loop peptides was identified. In order to confirm that this patient was coinfected with both viruses, PCRs with O- and M-specific discriminating primers corresponding to different regions of the genome were carried out with both primary lymphocyte DNA and the corresponding viral strains isolated from three consecutive patient samples. The PCR data suggested that this patient is coinfected with a group M virus and a recombinant M/O virus. Indeed, only type M gag sequences could be amplified, while for the env region, both type M and O sequences were amplified, from plasma or from DNA extracted from primary lymphocytes. Sequence analysis of a complete recombinant genome isolated from the second sample (97CA-MP645 virus isolate) revealed two intergroup breakpoints, one in the vpr gene and the second in the long terminal repeat region around the TATA box. Comparison of the type M sequences shared by the group M and the recombinant M/O viruses showed that these sequences were closely related, with only 3% genetic distance, suggesting that the M virus was one of the parental viruses. In this report we describe for the first time a recombination event in vivo between viruses belonging to two different groups, leading to a replicative virus. Recombination between strains with such distant lineages (65% overall homology) may contribute substantially to the emergence of new HIV-1 variants. We documented that this virus replicates well and became predominant in vitro. At this time, group O viruses represent a minority of the strains responsible for the HIV-1 pandemic. If such recombinant intergroup viruses gained better fitness, inducing changes in their biological properties compared to the parental group O virus, the prevalences of group O sequences could increase rapidly. This will have important implications for diagnosis of HIV-1 infections by serological and molecular tests, as well as for antiviral treatment.  相似文献   

18.
Research has shown that RNA virus populations are highly variable, most likely due to low fidelity replication of RNA genomes. It is generally assumed that populations of DNA viruses will be less complex and show reduced variability when compared to RNA viruses. Here, we describe the use of high throughput sequencing for a genome wide study of viral populations from urine samples of neonates with congenital human cytomegalovirus (HCMV) infections. We show that HCMV intrahost genomic variability, both at the nucleotide and amino acid level, is comparable to many RNA viruses, including HIV. Within intrahost populations, we find evidence of selective sweeps that may have resulted from immune-mediated mechanisms. Similarly, genome wide, population genetic analyses suggest that positive selection has contributed to the divergence of the HCMV species from its most recent ancestor. These data provide evidence that HCMV, a virus with a large dsDNA genome, exists as a complex mixture of genome types in humans and offer insights into the evolution of the virus.  相似文献   

19.
The limitation of using low electron doses in non-destructive cryo-electron tomography of biological specimens can be partially offset via averaging of aligned and structurally homogeneous subsets present in tomograms. This type of sub-volume averaging is especially challenging when multiple species are present. Here, we tackle the problem of conformational separation and alignment with a “collaborative” approach designed to reduce the effect of the “curse of dimensionality” encountered in standard pair-wise comparisons. Our new approach is based on using the nuclear norm as a collaborative similarity measure for alignment of sub-volumes, and by exploiting the presence of symmetry early in the processing. We provide a strict validation of this method by analyzing mixtures of intact simian immunodeficiency viruses SIV mac239 and SIV CP-MAC. Electron microscopic images of these two virus preparations are indistinguishable except for subtle differences in conformation of the envelope glycoproteins displayed on the surface of each virus particle. By using the nuclear norm-based, collaborative alignment method presented here, we demonstrate that the genetic identity of each virus particle present in the mixture can be assigned based solely on the structural information derived from single envelope glycoproteins displayed on the virus surface.  相似文献   

20.
The completed rice genome sequence will accelerate progress on the identification and functional classification of biologically important genes and serve as an invaluable resource for the comparative analysis of grass genomes. In this study, methods were developed for sequence-based alignment of sorghum and rice chromosomes and for refining the sorghum genetic/physical map based on the rice genome sequence. A framework of 135 BAC contigs spanning approximately 33 Mbp was anchored to sorghum chromosome 3. A limited number of sequences were collected from 118 of the BACs and subjected to BLASTX analysis to identify putative genes and BLASTN analysis to identify sequence matches to the rice genome. Extensive conservation of gene content and order between sorghum chromosome 3 and the homeologous rice chromosome 1 was observed. One large-scale rearrangement was detected involving the inversion of an approximately 59 cM block of the short arm of sorghum chromosome 3. Several small-scale changes in gene collinearity were detected, indicating that single genes and/or small clusters of genes have moved since the divergence of sorghum and rice. Additionally, the alignment of the sorghum physical map to the rice genome sequence allowed sequence-assisted assembly of an approximately 1.6 Mbp sorghum BAC contig. This streamlined approach to high-resolution genome alignment and map building will yield important information about the relationships between rice and sorghum genes and genomic segments and ultimately enhance our understanding of cereal genome structure and evolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号