首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
RNA virus genomes are compact, often containing multiple overlapping reading frames and functional secondary structure. Consequently, it is thought that evolutionary interactions between nucleotide sites are commonplace in the genomes of these infectious agents. However, the role of epistasis in natural populations of RNA viruses remains unclear. To investigate the pervasiveness of epistasis in RNA viruses, we used a parsimony-based computational method to identify pairs of co-occurring mutations along phylogenies of 177 RNA virus genes. This analysis revealed widespread evidence for positive epistatic interactions at both synonymous and nonsynonymous nucleotide sites and in both clonal and recombining viruses, with the majority of these interactions spanning very short sequence regions. These findings have important implications for understanding the key aspects of RNA virus evolution, including the dynamics of adaptation. Additionally, many comparative analyses that utilize the phylogenetic relationships among gene sequences assume that mutations represent independent, uncorrelated events. Our results show that this assumption may often be invalid.  相似文献   

3.
Existing computational methods for RNA secondary-structure prediction tacitly assume RNA to only encode functional RNA structures. However, experimental studies have revealed that some RNA sequences, e.g. compact viral genomes, can simultaneously encode functional RNA structures as well as proteins, and evidence is accumulating that this phenomenon may also be found in Eukaryotes. We here present the first comparative method, called RNA-DECODER, which explicitly takes the known protein-coding context of an RNA-sequence alignment into account in order to predict evolutionarily conserved secondary-structure elements, which may span both coding and non-coding regions. RNA-DECODER employs a stochastic context-free grammar together with a set of carefully devised phylogenetic substitution-models, which can disentangle and evaluate the different kinds of overlapping evolutionary constraints which arise. We show that RNA-DECODER's parameters can be automatically trained to successfully fold known secondary structures within the HCV genome. We scan the genomes of HCV and polio virus for conserved secondary-structure elements, and analyze performance as a function of available evolutionary information. On known secondary structures, RNA-DECODER shows a sensitivity similar to the programs MFOLD, PFOLD and RNAALIFOLD. When scanning the entire genomes of HCV and polio virus for structure elements, RNA-DECODER's results indicate a markedly higher specificity than MFOLD, PFOLD and RNAALIFOLD.  相似文献   

4.
Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G+C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytomegalovirus biology and pathogenesis.  相似文献   

5.
Protein-coding genes often contain long overlapping open-reading frames (ORFs), which may or may not be functional. Current methods that utilize the signature of purifying selection to detect functional overlapping genes are limited to the analysis of sequences from divergent species, thus rendering them inapplicable to genes found only in closely related sequences. Here, we present a method for the detection of selection signatures on overlapping reading frames by using closely related sequences, and apply the method to several known overlapping genes, and to an overlapping ORF on the negative strand of segment 8 of influenza A virus (NEG8), for which the suggestion has been made that it is functional. We find no evidence that NEG8 is under selection, suggesting that the intact reading frame might be non-functional, although we cannot fully exclude the possibility that the method is not sensitive enough to detect the signature of selection acting on this gene. We present the limitations of the method using known overlapping genes and suggest several approaches to improve it in future studies. Finally, we examine alternative explanations for the sequence conservation of NEG8 in the absence of selection. We show that overlap type and genomic context affect the conservation of intact overlapping ORFs and should therefore be considered in any attempt of estimating the signature of selection in overlapping genes.  相似文献   

6.
Owing to the degeneracy of the genetic code, protein-coding regions of mRNA sequences can harbour more than only amino acid information. We search the mRNA sequences of 11 human protein-coding genes for evolutionarily conserved secondary structure elements using RNA-Decoder, a comparative secondary structure prediction program that is capable of explicitly taking the known protein-coding context of the mRNA sequences into account. We detect well-defined, conserved RNA secondary structure elements in the coding regions of the mRNA sequences and show that base-paired codons strongly correlate with sparse codons. We also investigate the role of repetitive elements in the formation of secondary structure and explain the use of alternate start codons in the caveolin-1 gene by a conserved secondary structure element overlapping the nominal start codon. We discuss the functional roles of our novel findings in regulating the gene expression on mRNA level. We also investigate the role of secondary structure on the correct splicing of the human CFTR gene. We study the wild-type version of the pre-mRNA as well as 29 variants with synonymous mutations in exon 12. By comparing our predicted secondary structures to the experimentally determined splicing efficiencies, we find with weak statistical significance that pre-mRNAs with high-splicing efficiencies have different predicted secondary structures than pre-mRNAs with low-splicing efficiencies.  相似文献   

7.
8.
9.
10.

Background  

Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs).  相似文献   

11.
Messenger RNA (mRNA) processing plays important roles in gene expression in all domains of life. A number of cases of mRNA cleavage have been documented in Archaea, but available data are fragmentary. We have examined RNAs present in Methanocaldococcus (Methanococcus) jannaschii for evidence of RNA processing upstream of protein-coding genes. Of 123 regions covered by the data, 31 were found to be processed, with 30 including a cleavage site 12–16 nucleotides upstream of the corresponding translation start site. Analyses with 3′-RACE (rapid amplification of cDNA ends) and 5′-RACE indicate that the processing is endonucleolytic. Analyses of the sequences surrounding the processing sites for functional sites, sequence motifs, or potential RNA secondary structure elements did not reveal any recurring features except for an AUG translation start codon and (in most cases) a ribosome binding site. These properties differ from those of all previously described mRNA processing systems. Our data suggest that the processing alters the representation of various genes in the RNA pool and therefore, may play a significant role in defining the balance of proteins in the cell.  相似文献   

12.
Sequence analysis of the RNA genome termini of various vesiculovirus standard and defective interfering (DI) particles demonstrated that some virus regulatory sequences and domains of virus N protein are highly conserved while others show considerable divergence. Clearly, distinct RNA signal sequences and protein-coding regions of these virus genomes have quite different evolutionary pressures or constraints. Terminal regions of DI-particle RNA genomes of these viruses were found to possess self-complementary stems at the RNA termini, demonstrating the conservation of this DI-particle structural feature throughout the vesiculovirus group. A high degree of conservation of the 3'-terminal sequences of recent and historic isolates of vesicular stomatitis virus New Jersey was also demonstrated.  相似文献   

13.
In viruses an increased coding ability is provided by overlapping genes, in which two alternative open reading frames (ORFs) may be translated to yield two distinct proteins. The identification of signature sequences in overlapping genes is a topic of particular interest, since additional out-of-frame coding regions can be nested within known genes. In this work, a novel feature peculiar to overlapping coding regions is presented. It was detected by analysis of a sample set of 21 virus genomic sequences and consisted in the repeated occurrence of a cluster of basic amino acid residues, encoded by a frame, combined to a stretch of acidic residues, encoded by the corresponding overlapping frame. A computer scan of an additional set of virus sequences demonstrated that this feature is common to several other known overlapping ORFs and led to prediction of a novel overlapping gene in hepatitis G virus (HGV). The occurrence of a bifunctional coding region in HGV was also supported by its extremely lower rate of synonymous nucleotide substitutions compared to that observed in the other gene regions of the HGV genome. Analysis of the amino acid sequence that was deduced from the putative overlapping gene revealed a high content of basic residues and the presence of a nuclear targeting signal; these characteristics suggest that a core-like protein may be expressed by this novel ORF. Received: 21 July 1999 / Accepted: 26 October 1999  相似文献   

14.
We introduce a new approach in this article to distinguish protein-coding sequences from non-coding sequences utilizing a period-3, free energy signal that arises from the interactions of the 3′-terminal nucleotides of the 18S rRNA with mRNA. We extracted the special features of the amplitude and the phase of the period-3 signal in protein-coding regions, which is not found in non-coding regions, and used them to distinguish protein-coding sequences from non-coding sequences. We tested on all the experimental genes from Saccharomyces cerevisiae and Schizosaccharomyces pombe. The identification was consistent with the corresponding information from GenBank, and produced better performance compared to existing methods that use a period-3 signal. The primary tests on some fly, mouse and human genes suggests that our method is applicable to higher eukaryotic genes. The tests on pseudogenes indicated that most pseudogenes have no period-3 signal. Some exploration of the 3′-tail of 18S rRNA and pattern analysis of protein-coding sequences supported further our assumption that the 3′-tail of 18S rRNA has a role of synchronization throughout translation elongation process. This, in turn, can be utilized for the identification of protein-coding sequences.  相似文献   

15.
Molecular characterizations of bacteria often employ ribosomal DNA (rDNA) to establish the identity and relationships among organisms, but the use of rRNA sequences can be problematic as the result of alignment ambiguities caused by indels, the lack of informative characters, and varying functional constraints over the molecule. Although protein-coding regions have been used as an alternative to rRNA, there is neither consensus among the genes examined nor ways to rapidly obtain sequence information for such genes from uncharacterized bacterial species. To standardize the set of protein-coding loci assayed in bacterial genomes, we examined over 100 widely distributed genes to identify sets of universal primers for use in the PCR amplification of protein coding regions that are common to virtually all bacteria. From this set, we developed primer sets that each target of 10 genes spanning an array of genomic locations and functional categories. Although many of the primers contain sequence degeneracies that aid in targeting genes across diverse taxa, most are adequate for direct sequencing of amplification products, thereby eliminating intermediate cloning before sequence determination. We foresee the analysis of these protein-coding regions as being complementary to ribosomal DNA for answering questions pertaining to bacterial identification, classification, phylogenetics and evolution.  相似文献   

16.

Background  

Most single stranded RNA (ssRNA) viruses mutate rapidly to generate large number of strains having highly divergent capsid sequences. Accurate strain recognition in uncharacterized target capsid sequences is essential for epidemiology, diagnostics, and vaccine development. Strain recognition based on similarity scores between target sequences and sequences of homology matched reference strains is often time consuming and ambiguous. This is especially true if only partial target sequences are available or if different ssRNA virus families are jointly analyzed. In such cases, knowledge of residues that uniquely distinguish among known reference strains is critical for rapid and unambiguous strain identification. Conventional sequence comparisons are unable to identify such capsid residues due to high sequence divergence among the ssRNA virus reference strains. Consequently, automated general methods to reliably identify strains using strain distinguishing residues are not currently available.  相似文献   

17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号