首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The genomes of two positive-strand RNA viruses have recently been cloned from the serum of a GB agent-infected tamarin by using representational difference analysis. The two agent, GB viruses A and B (GBV-A and GBV-B, respectively), have genomes of 9,493 and 9,143 nucleotides, respectively, and single large open reading frames that encode potential polyprotein precursors of 2,972 and 2,864 amino acids, respectively. The genomes of these agents are organized much like those of other pestiviruses and flaviviruses, with genes predicted to encode structural and nonstructural proteins located at the 5' and 3' ends, respectively. Amino acid sequence alignments and subsequent phylogenetic analysis of the RNA-dependent RNA polymerases (RdRps) of GBV-A and GBV-B show that they possess conserved sequence motifs associated with supergroup II RNA polymerases of positive-strand RNA viruses. On the basis of similar analyses, the GBV-A- and GBV-B-encoded helicases show significant identity with the supergroup II helicases of positive-strand RNA viruses. Within the supergroup II RNA polymerases and helicases, GBV-A and GBV-B are most closely related to the hepatitis C virus group. Across their entire open reading frames, the GB agents exhibit 27% amino sequence identity to each other, approximately 28% identity to hepatitis C virus type 1, and approximately 20% identity to either bovine viral diarrhea virus or yellow fever virus. The degree of sequence divergence between GBV-A and GBV-B and other Flaviviridae members demonstrates that the GB agents are representatives of two new genera within the Flaviviridae family.  相似文献   

2.
GENIUS II is an automated database system in which open reading frames (ORFs) in complete genomes are assigned to known protein three-dimensional (3D) structures. The system uses the multiple intermediate sequence search method in which query and target sequences are linked by intermediate sequences gathered by PSI-BLAST search. By applying the system to 129 complete genomes, 43.8% on average of the ORFs in the genomes were assigned to known 3D structures and the results are available for free at GENIUS II web site.  相似文献   

3.
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.  相似文献   

4.
The development of new systems and strategies capable of synthesizing any desired soluble, labeled protein or protein fragment on a preparative scale is one of the most important tasks in biotechnology today. The Center for Eukaryotic Structural Genomics (WI, USA), in co-operation with Ehime University (Matsuyama, Japan) and CellFree Sciences Co., Ltd, has developed an automated platform for nuclear magnetic resonance-based structural proteomics that employs wheat germ extracts for cell-free production of labeled protein. The platform utilizes a single construct for all targets without any redesign of the DNA or RNA. Therefore, it offers advantages over commercial cell-free methods utilizing Escherichia coli extracts that require multiple constructs or redesign of the open reading frame. The protein production and labeling protocol is no more costly than E. coli cell-based approaches, is robust and scalable for high-throughput applications. This protocol has been used in the authors center to screen eukaryotic open reading frames from the Arabidopsis thaliana and human genomes and for the determination of nuclear magnetic resonance structures. With the recent addition of the GeneDecoder 1000 (CellFree Sciences Co., Ltd) robotic system, the Center for Eukaryotic Structural Genomics is able to carry out as many as 384 small-scale (50 microl) screening reactions per week. Furthermore, the Protemist (CellFree Sciences Co., Ltd) robotic system enables the Center for Eukaryotic Structural Genomics to carry out 16 production-scale (4 ml) reactions per week. Utilization of this automated platform technology to screen targets for expression and solubility and to produce stable isotope-labeled samples for nuclear magnetic resonance structure determinations is discussed.  相似文献   

5.
6.
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.  相似文献   

7.
We examine the translated open reading frames (ORFs) of the yeast Saccharomyces cerevisiae, focusing on those that have FASTA matches in phyletically defined sets of completely sequenced genomes. On this basis, we identify archaeal yeast, bacterial yeast, universal yeast, and yeast ORFs that do not have a match in any of nine prokaryote genomes. Similarly, we examine the yeast mitochondrial genome and the subset of the yeast nuclear ORFs identified as being involved in mitochondrial biogenesis. For the yeast ORFs that match one or more ORFs in these prokaryote genomes, we examine the phyletic and functional distributions of these matches as a function of match strength. These results provide genome level insights into the origin of the eukaryotic cell and the origin of mitochondria. More generally, they exemplify how the growing database of prokaryote genome sequences can help us understand eukaryote genomes.  相似文献   

8.
The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.  相似文献   

9.
小开放阅读框(small open reading frame, sORF)广泛存在于不同生物基因组中,由于其序列短,以及编码的产物小蛋白(smallprotein,或称微蛋白;microprotein或迷你蛋白miniprotein)检测困难等原因,小开放阅读框长期未得到充分注释和研究。近年来,随着高通量测序、翻译组和质谱分析等技术的不断发展,在不同生物中发现大量新的小开放阅读框,其编码的小蛋白及介导的翻译调控已应用于药物开发及植物抗病机理等研究。但是,目前对微生物的小开放阅读框相关研究和应用还相对有限。本文综述了小开放阅读框编码产物小蛋白的发现和鉴定,以及上游开放阅读框(upstream open reading frame, uORF)对mRNA翻译调控等最新研究进展,重点介绍了微生物基因组中小开放阅读框的鉴定和功能研究进展,为深入认识微生物中小开放阅读框的功能和作用机制,以及植物和动物等高等其他生物的小蛋白和翻译调控相关研究提供参考。  相似文献   

10.
Identification of functional open reading frames in chloroplast genomes   总被引:7,自引:0,他引:7  
K H Wolfe  P M Sharp 《Gene》1988,66(2):215-222
We have used a rapid computer dot-matrix comparison method to identify all DNA regions which have been evolutionarily conserved between the completely sequenced chloroplast genomes of tobacco and a liverwort. Analysis of these regions reveals 74 homologous open reading frames (ORFs) which have been conserved as to length and amino acid sequence; these ORFs also have an excess of nucleotide substitutions at silent sites of codons. Since the nonfunctional parts of these genomes have become saturated with mutations and show no sequence similarity whatsoever, the homologous ORFs are almost certainly functional. A further four pairs of ORFs show homology limited to only a short part of their putative gene products. Amino acid sequence identities range between 50 and 99%; some chloroplast proteins are seen to be among the most slowly evolving of all known proteins. A search of the nucleotide and amino acid sequence databanks has revealed several previously unidentified genes in chloroplast sequences from other species, but no new homologies to prokaryotic genes.  相似文献   

11.
Glutamate synthase, glutamine α-ketoglutarate amidotransferase (often abbreviated as GOGAT) is a key enzyme in the early stages of ammonia assimilation in bacteria, algae and plants, catalyzing the reductive transamidation of the amido nitrogen from glutamine to α-ketoglutarate to form two molecules of glutamate. Most bacterial glutamate synthases consist of a large and small subunit. The genomes of three Pyrococcus species harbour several open reading frames which show homology with the small subunit of glutamate synthase. There are no open reading frames which may be coding for a large subunit responsible for the glutamate formation in these pyrococcal genomes.In this work, two open reading frames PH0876 and PH1873 from P. horikoshii were cloned and expressed in Escherichia coli as soluble proteins. Both proteins show NADPH-dependent oxidoreductase activity using artificial electron acceptors iodonitrotetrazolium chloride at thermophilic conditions. It is possible that these open reading frames are the products of gene duplication and that they are the early forms of an electron transfer domain in archaea which may have later contributed to many electron transfer enzymes.  相似文献   

12.
Subtype ayw variant of hepatitis B virus. DNA primary structure analysis   总被引:14,自引:0,他引:14  
The entire genome of human hepatitis B virus (HBV) occurring in Latvia was sequenced. This sequence, which is 3182 nucleotides long, was compared with the other previously published HBV genomes and was shown to share maximum homology with HBV subtype ayw DNA. The coordinates of 4 main open reading frames as well as hairpin structures are very well conserved in the two genomes. The distribution of nucleotide substitutions among different HBV genomes suggest that the open reading frames P and X can fulfil a coding function. On the basis of primary structure comparison for hepadnaviral DNAs several evolutionary conclusions can be drawn.  相似文献   

13.
An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.  相似文献   

14.
In sequenced genomes, protein coding regions with unassigned function constitute between 10 and 50% of all open reading frames. Often key enzymes cannot be identified using sequence homology searches. For example, despite the fact that methanogens have an apparently functional gluconeogenesis pathway, standard tools have been unable to identify a fructose-1,6-bisphosphatase (FBPase) gene in the sequenced Methanoccocus jannaschii genome. Using a combination of functional and structural tools, we have shown that the protein product of the M. jannaschii gene MJ0109, which had been tentatively annotated as an inositol monophosphatase (IMPase), has both IMPase and FBPase activities. Moreover, several gene products annotated as IMPases from different thermophilic organisms also possess FBPase activity. Thus, we have found the FBPase that was 'missing' in thermophiles and shown that it also functions as an IMPase.  相似文献   

15.
BACKGROUND: Endogenous retroviruses contribute to the evolution of the host genome and can be associated with disease. Human endogenous retrovirus K (HERV-K) is related to the mouse mammary tumor virus and is present in the genomes of humans, apes and cercopithecoids (Old World monkeys). It is unknown how long ago in primate evolution the full-length HERV-K proviruses that are in the human genome today were formed. RESULTS: Ten full-length HERV-K proviruses were cloned from the human genome. Using provirus-specific probes, eight of the ten were found to be present in a genetically diverse set of humans but not in other extant hominoids. Intact preintegration sites for each of these eight proviruses were present in the apes. A ninth provirus was detected in the human, chimpanzee, bonobo and gorilla genomes, but not in the orang-utan genome. The tenth was found only in humans, chimpanzees and bonobos. Complete sequencing of six of the human-specific proviruses showed that full-length open reading frames for the retroviral protein precursors Gag-Pro-Pol or Env were each present in multiple proviruses. CONCLUSIONS: At least eight full-length HERV-K genomes that are in the human germline today integrated after humans diverged from chimpanzees. All of the viral open reading frames and cis-acting sequences necessary for HERV-K replication must have been intact during the recent time when these proviruses formed. Multiple full-length open reading frames for all HERV-K proteins are present in the human genome today.  相似文献   

16.
We have determined the complete nucleotide sequence of an infectious cloned genome of ground squirrel hepatitis virus (GSHV), a nonpathogenic member of the hepadnavirus group. The genome is 3,311 base pairs long and contains the major open reading frames described for the related human and woodchuck hepatitis B viruses (HBV and WHV, respectively). These reading frames include genes for the major structural proteins (the surface and core antigens), unassigned open reading frames (A and B), the longer of which is presumed to encode the viral DNA polymerase, and an open reading frame preceding and continuous with the surface antigen gene. The arrangement of these open reading frames is similar to that encountered in the genomes of HBV and WHV: all of the reading frames are encoded on the same strand, they are positioned in the same fashion with respect to each other, and a large portion (at least 51%) of the genome can be translated in two reading frames. Comparisons of the predicted translational products of the three mammalian hepadnaviruses reveal 78% amino acid homology between the proteins of GSHV and WHV and 43% homology between those of GSHV and HBV. In addition, a perfect direct repeat of 10 to 11 base pairs, separated by ca. 46 to 223 base pairs, is present in the three mammalian viruses and in duck hepatitis B virus; the position of the repeats near the 5' termini of the two strands of virion DNA suggests a role in viral replication.  相似文献   

17.
Simple sequence repeats in the Helicobacter pylori genome   总被引:5,自引:4,他引:1  
We describe an integrated system for the analysis of DNA sequence motifs within complete bacterial genome sequences. This system is based around ACeDB, a genome database with an integrated graphical user interface; we identify and display motifs in the context of genetic, sequence and bibliographic data. Tomb et al . (1997) previously reported the identification of contingency genes in Helicobacter pylori through their association with homopolymeric tracts and dinucleotide repeats. With this as a starting point, we validated the system by a search for this type of repeat and used the contextual information to assess the likelihood that they mediate phase variation in the associated open reading frames (ORFs). We found all of the repeats previously described, and identified 27 putative phase-variable genes (including 17 previously described). These could be divided into three groups: lipopolysaccharide (LPS) biosynthesis, cell-surface-associated proteins and DNA restriction/modification systems. Five of the putative genes did not have obvious homologues in any of the public domain sequence databases. The reading frame of some ORFs was disrupted by the presence of the repeats, including the alpha(1-2) fucosyltransferase gene, necessary for the synthesis of the Lewis Y epitope. An additional benefit of this approach is that the results of each search can be analysed further and compared with those from other genomes. This revealed that H . pylori has an unusually high frequency of homopurine:homopyrimidine repeats suggesting mechanistic biases that favour their presence and instability.  相似文献   

18.
19.
One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.  相似文献   

20.
How will bioinformatics influence metabolic engineering?   总被引:5,自引:0,他引:5  
Ten microbial genomes have been fully sequenced to date, and the sequencing of many more genomes is expected to be completed before the end of the century. The assignment of function to open reading frames (ORFs) is progressing, and for some genomes over 70% of functional assignments have been made. The majority of the assigned ORFs relate to metabolic functions. Thus, the complete genetic and biochemical functions of a number of microbial cells may be soon available. From a metabolic engineering standpoint, these developments open a new realm of possibilities. Metabolic analysis and engineering strategies can now be built on a sound genomic basis. An important question that now arises; how should these tasks be approached? Flux-balance analysis (FBA) has the potential to play an important role. It is based on the fundamental principle of mass conservation. It requires only the stoichiometric matrix, the metabolic demands, and some strain specific parameters. Importantly, no enzymatic kinetic data is required. In this article, we show how the genomically defined microbial metabolic genotypes can be analyzed by FBA. Fundamental concepts of metabolic genotype, metabolic phenotype, metabolic redundancy and robustness are defined and examples of their use given. We discuss the advantage of this approach, and how FBA is expected to find uses in the near future. FBA is likely to become an important analysis tool for genomically based approaches to metabolic engineering, strain design, and development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号