首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
We developed a semi-automated genome analysis system called GAMBLER in order to support the current whole-genome sequencing project focusing on alkaliphilic Bacillus halodurans C-125. GAMBLER was designed to reduce the human intervention required and to reduce the complications in annotating thousands of ORFs in the microbial genome. GAMBLER automates three major routines: analyzing assembly results provided by genome assembler software, assigning ORFs, and homology searching. GAMBLER is equipped with an interface for convenience of annotation. All processes and options are manipulatable through a WWW browser that enables scientists to share their genome analysis results without choosing computer platforms.  相似文献   

2.
Previous studies of the avian reovirus strain S1133 (ARV-S1133) S1 genome segment revealed that the open reading frame (ORF) encoding the final sigmaC viral cell attachment protein initiates over 600 nucleotides distal from the 5' end of the S1 mRNA and is preceded by two predicted small nonoverlapping ORFs. To more clearly define the translational properties of this unusual polycistronic RNA, we pursued a comparative analysis of the S1 genome segment of the related Nelson Bay reovirus (NBV). Sequence analysis indicated that the 3'-proximal ORF present on the NBV S1 genome segment also encodes a final sigmaC homolog, as evidenced by the presence of an extended N-terminal heptad repeat characteristic of the coiled-coil region common to the cell attachment proteins of reoviruses. Most importantly, the NBV S1 genome segment contains two conserved ORFs upstream of the final sigmaC coding region that are extended relative to the predicted ORFs of ARV-S1133 and are arranged in a sequential, partially overlapping fashion. Sequence analysis of the S1 genome segments of two additional strains of ARV indicated a similar overlapping tricistronic gene arrangement as predicted for the NBV S1 genome segment. Expression analysis of the ARV S1 genome segment indicated that all three ORFs are functional in vitro and in virus-infected cells. In addition to the previously described p10 and final sigmaC gene products, the S1 genome segment encodes from the central ORF a 17-kDa basic protein (p17) of no known function. Optimizing the translation start site of the ARV p10 ORF lead to an approximately 15-fold increase in p10 expression with little or no effect on translation of the downstream final sigmaC ORF. These results suggest that translation initiation complexes can bypass over 600 nucleotides and two functional overlapping upstream ORFs in order to access the distal final sigmaC start site.  相似文献   

3.
The complete genome of Cnaphalocrocis medinalis granulovirus(CnmeGV) from a serious migratory rice pest, Cnaphalocrocis medinalis(Lepidoptera: Pyralidae), was sequenced using the Roche 454 Genome Sequencer FLX system(GS FLX) with shotgun strategy and assembled by Roche GS De Novo assembler software. Its circular double-stranded genome is 111,246 bp in size with a high A+T content of 64.8% and codes for 118 putative open reading frames(ORFs). It contains 37 conserved baculovirus core ORFs, 13 unique ORFs, 26 ORFs that were found in all Lepidoptera baculoviruses and 42 common ORFs. The analysis of nucleotide sequence repeats revealed that the CnmeGV genome differs from the rest of sequenced GVs by a 23 kb and a 17 kb gene block inversions, and does not contain any typical homologous region(hr) except for a region of non-hr-like sequence. Chitinase and cathepsin genes, which are reported to have major roles in the liquefaction of the hosts, were not found in the CnmeGV genome, which explains why CnmeGV infected insects do not show the phenotype of typical liquefaction. Phylogenetic analysis,based on the 37 core baculovirus genes, indicates that CnmeGV is closely related to Adoxophyes orana granulovirus. The genome analysis would contribute to the functional research of CnmeGV,and would benefit to the utilization of CnmeGV as pest control reagent for rice production.  相似文献   

4.
5.
The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points of ORFs or other interesting phenomena) along the genome. We use piecewise constant intensity models with varying number of pieces, and show how a reversible jump Markov Chain Monte Carlo (RJMCMC) method can be used to obtain a posteriori distribution on the intensity of the patterns along the genome. We apply the method to modeling the occurrence of ORFs in the human genome. The results show that the chromosomes consist of 5-35 clearly distinct segments, and that the posteriori number and length of the segments shows significant variation. On the other hand, for the yeast genome the intensity of ORFs is nearly constant.  相似文献   

6.
At 346 kbp in size, the genome of a jumbo bacteriophage vB_KleM-RaK2 (RaK2) is the largest Klebsiella infecting myovirus genome sequenced to date. In total, 272 out of 534 RaK2 ORFs lack detectable database homologues. Based on the similarity to biologically defined proteins and/or MS/MS analysis, 117 of RaK2 ORFs were given a functional annotation, including 28 RaK2 ORFs coding for structural proteins that have no reliable homologues to annotated structural proteins in other organisms. The electron micrographs revealed elaborate spike-like structures on the tail fibers of Rak2, suggesting that this phage is an atypical myovirus. While head and tail proteins of RaK2 are mostly myoviridae-related, the bioinformatics analysis indicate that tail fibers/spikes of this phage are formed from podovirus-like peptides predominantly. Overall, these results provide evidence that bacteriophage RaK2 differs profoundly from previously studied viruses of the Myoviridae family.  相似文献   

7.
Bacteriophage M102 is a lytic phage specific for serotype c strains of Streptococcus mutans, a causative agent of dental caries. In this study, the complete genome sequence of M102 was determined. The genome is 31,147 bp in size and contains 41 ORFs. Most of the ORFs encoding putative phage structural proteins show similarity to those from bacteriophages from Streptococcus thermophilus. Bioinformatic analysis indicated that the M102 genome contains an unusual lysis cassette, which encodes a holin and two lytic enzymes.  相似文献   

8.
We examine the translated open reading frames (ORFs) of the yeast Saccharomyces cerevisiae, focusing on those that have FASTA matches in phyletically defined sets of completely sequenced genomes. On this basis, we identify archaeal yeast, bacterial yeast, universal yeast, and yeast ORFs that do not have a match in any of nine prokaryote genomes. Similarly, we examine the yeast mitochondrial genome and the subset of the yeast nuclear ORFs identified as being involved in mitochondrial biogenesis. For the yeast ORFs that match one or more ORFs in these prokaryote genomes, we examine the phyletic and functional distributions of these matches as a function of match strength. These results provide genome level insights into the origin of the eukaryotic cell and the origin of mitochondria. More generally, they exemplify how the growing database of prokaryote genome sequences can help us understand eukaryote genomes.  相似文献   

9.
Escherichia coli, including the closely related genus Shigella, is a highly diverse species in terms of genome structure. Comparative genomic hybridization (CGH) microarray analysis was used to compare the gene content of E. coli K-12 with the gene contents of pathogenic strains. Missing genes in a pathogen were detected on a microarray slide spotted with 4,071 open reading frames (ORFs) of W3110, a commonly used wild-type K-12 strain. For 22 strains subjected to the CGH microarray analyses 1,424 ORFs were found to be absent in at least one strain. The common backbone of the E. coli genome was estimated to contain about 2,800 ORFs. The mosaic distribution of absent regions indicated that the genomes of pathogenic strains were highly diversified because of insertions and deletions. Prophages, cell envelope genes, transporter genes, and regulator genes in the K-12 genome often were not present in pathogens. The gene contents of the strains tested were recognized as a matrix for a neighbor-joining analysis. The phylogenic tree obtained was consistent with the results of previous studies. However, unique relationships between enteroinvasive strains and Shigella, uropathogenic, and some enteropathogenic strains were suggested by the results of this study. The data demonstrated that the CGH microarray technique is useful not only for genomic comparisons but also for phylogenic analysis of E. coli at the strain level.  相似文献   

10.
11.
Complete sequence and genomic analysis of murine gammaherpesvirus 68.   总被引:19,自引:13,他引:19       下载免费PDF全文
Murine gammaherpesvirus 68 (gammaHV68) infects mice, thus providing a tractable small-animal model for analysis of the acute and chronic pathogenesis of gammaherpesviruses. To facilitate molecular analysis of gammaHV68 pathogenesis, we have sequenced the gammaHV68 genome. The genome contains 118,237 bp of unique sequence flanked by multiple copies of a 1,213-bp terminal repeat. The GC content of the unique portion of the genome is 46%, while the GC content of the terminal repeat is 78%. The unique portion of the genome is estimated to encode at least 80 genes and is largely colinear with the genomes of Kaposi's sarcoma herpesvirus (KSHV; also known as human herpesvirus 8), herpesvirus saimiri (HVS), and Epstein-Barr virus (EBV). We detected 63 open reading frames (ORFs) homologous to HVS and KSHV ORFs and used the HVS/KSHV numbering system to designate these ORFs. gammaHV68 shares with HVS and KSHV ORFs homologous to a complement regulatory protein (ORF 4), a D-type cyclin (ORF 72), and a G-protein-coupled receptor with close homology to the interleukin-8 receptor (ORF 74). One ORF (K3) was identified in gammaHV68 as homologous to both ORFs K3 and K5 of KSHV and contains a domain found in a bovine herpesvirus 4 major immediate-early protein. We also detected 16 methionine-initiated ORFs predicted to encode proteins at least 100 amino acids in length that are unique to gammaHV68 (ORFs M1 to 14). ORF M1 has striking homology to poxvirus serpins, while ORF M11 encodes a potential homolog of Bcl-2-like molecules encoded by other gammaherpesviruses (gene 16 of HVS and KSHV and the BHRF1 gene of EBV). In addition, clustered at the left end of the unique region are eight sequences with significant homology to bacterial tRNAs. The unique region of the genome contains two internal repeats: a 40-bp repeat located between bp 26778 and 28191 in the genome and a 100-bp repeat located between bp 98981 and 101170. Analysis of the gammaHV68, HVS, EBV, and KSHV genomes demonstrated that each of these viruses have large colinear gene blocks interspersed by regions containing virus-specific ORFs. Interestingly, genes associated with EBV cell tropism, latency, and transformation are all contained within these regions encoding virus-specific genes. This finding suggests that pathogenesis-associated genes of gammaherpesviruses, including gammaHV68, may be contained in similarly positioned genome regions. The availability of the gammaHV68 genomic sequence will facilitate analysis of critical issues in gammaherpesvirus biology via integration of molecular and pathogenetic studies in a small-animal model.  相似文献   

12.
The complete genome sequences of two Sulfolobus spindle-shaped viruses (SSVs) from acidic hot springs in Kamchatka (Russia) and Yellowstone National Park (United States) have been determined. These nonlytic temperate viruses were isolated from hyperthermophilic Sulfolobus hosts, and both viruses share the spindle-shaped morphology characteristic of the Fuselloviridae family. These two genomes, in combination with the previously determined SSV1 genome from Japan and the SSV2 genome from Iceland, have allowed us to carry out a phylogenetic comparison of these geographically distributed hyperthermal viruses. Each virus contains a circular double-stranded DNA genome of approximately 15 kbp with approximately 34 open reading frames (ORFs). These Fusellovirus ORFs show little or no similarity to genes in the public databases. In contrast, 18 ORFs are common to all four isolates and may represent the minimal gene set defining this viral group. In general, ORFs on one half of the genome are colinear and highly conserved, while ORFs on the other half are not. One shared ORF among all four genomes is an integrase of the tyrosine recombinase family. All four viral genomes integrate into their host tRNA genes. The specific tRNA gene used for integration varies, and one genome integrates into multiple loci. Several unique ORFs are found in the genome of each isolate.  相似文献   

13.
The genome of the metal sulfide-oxidizing, thermoacidophilic strain Metallosphaera cuprina Ar-4 has been completely sequenced and annotated. Originally isolated from a sulfuric hot spring, strain Ar-4 grows optimally at 65°C and a pH of 3.5. The M. cuprina genome has a 1,840,348-bp circular chromosome (2,029 open reading frames [ORFs]) and is 16% smaller than the previously sequenced Metallosphaera sedula genome. Compared to the M. sedula genome, there are no counterpart genes in the M. cuprina genome for about 480 ORFs in the M. sedula genome, of which 243 ORFs are annotated as hypothetical protein genes. Still, there are 233 ORFs uniquely occurring in M. cuprina. Genome annotation supports that M. cuprina lives a facultative life on CO(2) and organics and obtains energy from oxidation of sulfidic ores and reduced inorganic sulfuric compounds.  相似文献   

14.
15.
Past analyses of the genome of the yeast Saccharomyces cerevisiae have revealed substantial regional variation in G+C content. Important questions remain, though, as to the origin, nature, significance, and generality of this variation. We conducted an extensive analysis of the yeast genome to try to answer these questions. Our results indicate that open reading frames (ORFs) with similar G+C contents at silent codon positions are significantly clustered on chromosomes. This clustering can be explained by very short range correlations of silent-site G+C contents at neighboring ORFs. ORFs of high silent-site G+C content are disproportionately concentrated on shorter chromosomes, which causes a negative relationship between chromosome length and G+C content. Contrary to previous reports, there is no correlation between gene density and silent-site G+C content in yeast. Chromosome III is atypical in many regards, and possible reasons for this are discussed.  相似文献   

16.
The complete sequence of the genome of an aerobic hyper-thermophiliccrenarchaeon, Aeropyrum pernix K1, which optimally grows at95°C, has been determined by the whole genome shotgun methodwith some modifications. The entire length of the genome was1,669,695 bp. The authenticity of the entire sequence was supportedby restriction analysis of long PCR products, which were directlyamplified from the genomic DNA. As the potential protein-codingregions, a total of 2,694 open reading frames (ORFs) were assigned.By similarity search against public databases, 633 (23.5%) ofthe ORFs were related to genes with putative function and 523(19.4%) to the sequences registered but with unknown function.All the genes in the TCA cycle except for that of alpha-ketoglutaratedehydrogenase were included, and instead of the alpha-ketoglutaratedehydrogenase gene, the genes coding for the two subunits of2-oxoacid:ferredoxin oxidoreductase were identified. The remaining1,538 ORFs (57.1%) did not show any significant similarity tothe sequences in the databases. Sequence comparison among theassigned ORFs suggested that a considerable member of ORFs weregenerated by sequence duplication. The RNA genes identifiedwere a single 16S–23S rRNA operon, two 5S rRNA genes and47 tRNA genes including 14 genes with intron structures. Allthe assigned ORFs and RNA coding regions occupied 89.12% ofthe whole genome. The data presented in this paper are availableon the internet homepage (http://www.mild.nite.go.jp).  相似文献   

17.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.  相似文献   

18.
对油桐尺蠖单粒包埋核型多用体病毒(Buzurasuppressariasingle-nucleocapsidnucleopolyhedrovirus,BusuNPV)基因组中BamHI-H片段的序列进行分析,该片段全长2422bp,包括三个开放阅读框:p47基因(AcMNPVORF40的同源区)的5′端,完整的组织蛋白酶基因(cathepsin)(AcMNPVORF127的同源区)和p74基因(AcMNPVORF138的同源区)的3′端。序列比较分析表明,BusuNPV的这三个基因与其它杆状病毒的同源基因具有相同的结构保守区。BusuNPV基因组BamHI-H片段上这三个基因的排列顺序完全不同于AcMNPV相应基因的排列顺序。  相似文献   

19.
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).  相似文献   

20.
Bacteriophage S-PM2 infects several strains of the abundant and ecologically important marine cyanobacterium Synechococcus. A large lytic phage with an isometric icosahedral head, S-PM2 has a contractile tail and by this criterion is classified as a myovirus (1). The linear, circularly permuted, 196,280-bp double-stranded DNA genome of S-PM2 contains 37.8% G+C residues. It encodes 239 open reading frames (ORFs) and 25 tRNAs. Of these ORFs, 19 appear to encode proteins associated with the cell envelope, including a putative S-layer-associated protein. Twenty additional S-PM2 ORFs have homologues in the genomes of their cyanobacterial hosts. There is a group I self-splicing intron within the gene encoding the D1 protein. A total of 40 ORFs, organized into discrete clusters, encode homologues of T4 proteins involved in virion morphogenesis, nucleotide metabolism, gene regulation, and DNA replication and repair. The S-PM2 genome encodes a few surprisingly large (e.g., 3,779 amino acids) ORFs of unknown function. Our analysis of the S-PM2 genome suggests that many of the unknown S-PM2 functions may be involved in the adaptation of the metabolism of the host cell to the requirements of phage infection. This hypothesis originates from the identification of multiple phage-mediated modifications of the host's photosynthetic apparatus that appear to be essential for maintaining energy production during the lytic cycle.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号