首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Identification of functional open reading frames in chloroplast genomes   总被引:7,自引:0,他引:7  
K H Wolfe  P M Sharp 《Gene》1988,66(2):215-222
We have used a rapid computer dot-matrix comparison method to identify all DNA regions which have been evolutionarily conserved between the completely sequenced chloroplast genomes of tobacco and a liverwort. Analysis of these regions reveals 74 homologous open reading frames (ORFs) which have been conserved as to length and amino acid sequence; these ORFs also have an excess of nucleotide substitutions at silent sites of codons. Since the nonfunctional parts of these genomes have become saturated with mutations and show no sequence similarity whatsoever, the homologous ORFs are almost certainly functional. A further four pairs of ORFs show homology limited to only a short part of their putative gene products. Amino acid sequence identities range between 50 and 99%; some chloroplast proteins are seen to be among the most slowly evolving of all known proteins. A search of the nucleotide and amino acid sequence databanks has revealed several previously unidentified genes in chloroplast sequences from other species, but no new homologies to prokaryotic genes.  相似文献   

2.
The capsid of cytomegalovirus contains an abundant, low-molecular-weight protein whose coding sequence within the viral genome had not been identified. We have used a combination of biochemical and immunological techniques to demonstrate that this protein, called the smallest capsid protein in human cytomegalovirus, is encoded by a previously unidentified 225-bp open reading frame (ORF) located between ORFs UL48 and UL49. This short ORF, called UL48/49, is the positional homolog of herpes simplex virus ORF UL35 (encoding capsid protein VP26) and shows partial amino acid sequence identity to positional homologs in human herpes viruses 6 and 7.  相似文献   

3.
Of 30 baculovirus genomes that have been sequenced to date, the only nonlepidopteran baculoviruses include the dipteran Culex nigripalpus nucleopolyhedrovirus and two hymenopteran nucleopolyhedroviruses that infect the sawflies Neodiprion lecontei (NeleNPV) and Neodiprion sertifer (NeseNPV). This study provides a complete sequence and genome analysis of the nucleopolyhedrovirus that infects the balsam fir sawfly Neodiprion abietis (Hymenoptera, Symphyta, Diprionidae). The N. abietis nucleopolyhedrovirus (NeabNPV) is 84,264 bp in size, with a G+C content of 33.5%, and contains 93 predicted open reading frames (ORFs). Eleven predicted ORFs are unique to this baculovirus, 10 ORFs have a putative sequence homologue in the NeleNPV genome but not the NeseNPV genome, and 1 ORF (neab53) has a putative sequence homologue in the NeseNPV genome but not the NeleNPV genome. Specific repeat sequences are coincident with major genome rearrangements that distinguish NeabNPV and NeleNPV. Genes associated with these repeat regions encode a common amino acid motif, suggesting that they are a family of repeated contiguous gene clusters. Lepidopteran baculoviruses, similarly, have a family of repeated genes called the bro gene family. However, there is no significant sequence similarity between the NeabNPV and bro genes. Homologues of early-expressed genes such as ie-1 and lef-3 were absent in NeabNPV, as they are in the previously sequenced hymenopteran baculoviruses. Analyses of ORF upstream sequences identified potential temporally distinct genes on the basis of putative promoter elements.  相似文献   

4.
5.
Simple sequence repeats in the Helicobacter pylori genome   总被引:5,自引:4,他引:1  
We describe an integrated system for the analysis of DNA sequence motifs within complete bacterial genome sequences. This system is based around ACeDB, a genome database with an integrated graphical user interface; we identify and display motifs in the context of genetic, sequence and bibliographic data. Tomb et al . (1997) previously reported the identification of contingency genes in Helicobacter pylori through their association with homopolymeric tracts and dinucleotide repeats. With this as a starting point, we validated the system by a search for this type of repeat and used the contextual information to assess the likelihood that they mediate phase variation in the associated open reading frames (ORFs). We found all of the repeats previously described, and identified 27 putative phase-variable genes (including 17 previously described). These could be divided into three groups: lipopolysaccharide (LPS) biosynthesis, cell-surface-associated proteins and DNA restriction/modification systems. Five of the putative genes did not have obvious homologues in any of the public domain sequence databases. The reading frame of some ORFs was disrupted by the presence of the repeats, including the alpha(1-2) fucosyltransferase gene, necessary for the synthesis of the Lewis Y epitope. An additional benefit of this approach is that the results of each search can be analysed further and compared with those from other genomes. This revealed that H . pylori has an unusually high frequency of homopurine:homopyrimidine repeats suggesting mechanistic biases that favour their presence and instability.  相似文献   

6.
Zhang QY  Xiao F  Xie J  Li ZQ  Gui JF 《Journal of virology》2004,78(13):6982-6994
Lymphocystis diseases in fish throughout the world have been extensively described. Here we report the complete genome sequence of lymphocystis disease virus isolated in China (LCDV-C), an LCDV isolated from cultured flounder (Paralichthys olivaceus) with lymphocystis disease in China. The LCDV-C genome is 186,250 bp, with a base composition of 27.25% G+C. Computer-assisted analysis revealed 240 potential open reading frames (ORFs) and 176 nonoverlapping putative viral genes, which encode polypeptides ranging from 40 to 1,193 amino acids. The percent coding density is 67%, and the average length of each ORF is 702 bp. A search of the GenBank database using the 176 individual putative genes revealed 103 homologues to the corresponding ORFs of LCDV-1 and 73 potential genes that were not found in LCDV-1 and other iridoviruses. Among the 73 genes, there are 8 genes that contain conserved domains of cellular genes and 65 novel genes that do not show any significant homology with the sequences in public databases. Although a certain extent of similarity between putative gene products of LCDV-C and corresponding proteins of LCDV-1 was revealed, no colinearity was detected when their ORF arrangements and coding strategies were compared to each other, suggesting that a high degree of genetic rearrangements between them has occurred. And a large number of tandem and overlapping repeated sequences were observed in the LCDV-C genome. The deduced amino acid sequence of the major capsid protein (MCP) presents the highest identity to those of LCDV-1 and other iridoviruses among the LCDV-C gene products. Furthermore, a phylogenetic tree was constructed based on the multiple alignments of nine MCP amino acid sequences. Interestingly, LCDV-C and LCDV-1 were clustered together, but their amino acid identity is much less than that in other clusters. The unexpected levels of divergence between their genomes in size, gene organization, and gene product identity suggest that LCDV-C and LCDV-1 shouldn't belong to a same species and that LCDV-C should be considered a species different from LCDV-1.  相似文献   

7.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.  相似文献   

8.
Histones are highly basic, relatively small proteins that complex with DNA to form higher order structures that underlie chromosome topology. Of the four core histones H2A, H2B, H3 and H4, it is H3 that is most heavily modified at the post-translational level. The human genome harbours 16 annotated bona fide histone H3 genes which code for four H3 protein variants. In 2010, two novel histone H3.3 protein variants were reported, carrying over twenty amino acid substitutions. Nevertheless, they appear to be incorporated into chromatin. Interestingly, these new H3 genes are located on human chromosome 5 in a repetitive region that harbours an additional five H3 pseudogenes, but no other core histone ORFs. In addition, a human-specific novel putative histone H3.3 variant located at 12p11.21 was reported in 2011. These developments raised the question as to how many more human histone H3 ORFs there may be. Using homology searches, we detected 41 histone H3 pseudogenes in the current human genome assembly. The large majority are derived from the H3.3 gene H3F3A, and three of those may code for yet more histone H3.3 protein variants. We also identified one extra intact H3.2-type variant ORF in the vicinity of the canonical HIST2 gene cluster at chromosome 1p21.2. RNA polymerase II occupancy data revealed heterogeneity in H3 gene expression in human cell lines. None of the novel H3 genes were significantly occupied by RNA polymerase II in the data sets at hand, however. We discuss the implications of these recent developments.  相似文献   

9.
Sequence analysis of the simian foamy virus type 1 genome.   总被引:11,自引:0,他引:11  
J J Kupiec  A Kay  M Hayat  R Ravier  J Périès  F Galibert 《Gene》1991,101(2):185-194
  相似文献   

10.
11.
12.
Cytomegaloviruses are highly host restricted, resulting in cospeciation with their hosts. As a natural pathogen of rhesus macaques (RM), rhesus cytomegalovirus (RhCMV) has therefore emerged as a highly relevant experimental model for pathogenesis and vaccine development due to its close evolutionary relationship to human CMV (HCMV). Most in vivo experiments performed with RhCMV employed strain 68-1 cloned as a bacterial artificial chromosome (BAC). However, the complete genome sequence of the 68-1 BAC has not been determined. Furthermore, the gene content of the RhCMV genome is unknown, and previous open reading frame (ORF) predictions relied solely on uninterrupted ORFs with an arbitrary cutoff of 300 bp. To obtain a more precise picture of the actual proteins encoded by the most commonly used molecular clone of RhCMV, we reevaluated the RhCMV 68-1 BAC genome by whole-genome shotgun sequencing and determined the protein content of the resulting RhCMV virions by proteomics. By comparing the RhCMV genome to those of several related Old World monkey (OWM) CMVs, we were able to filter out many unlikely ORFs and obtain a simplified map of the RhCMV genome. This comparative genomics analysis suggests a high degree of ORF conservation among OWM CMVs, thus decreasing the likelihood that ORFs found only in RhCMV comprise true genes. Moreover, virion proteomics independently validated the revised ORF predictions, since only proteins that were conserved across OWM CMVs could be detected. Taken together, these data suggest a much higher conservation of genome and virion structure between CMVs of humans, apes, and OWMs than previously assumed.  相似文献   

13.
14.
P Li  B Chen  Z Song  Y Song  Y Yang  P Ma  H Wang  J Ying  P Ren  L Yang  G Gao  S Jin  Q Bao  H Yang 《Gene》2012,507(2):125-134
As one of the pathogens of hospital-acquired infections, Acinetobacter baumannii poses great challenges to the public health. A. baumannii phage could be an effective way to fight multi-resistant A. baumannii. Here, we completed the whole genome sequencing of the complete genome of A. baumannii phage AB1, which consists of 45,159bp and is a double-stranded DNA molecule with an average GC content of 37.7%. The genome encodes one tRNA gene and 85 open reading frames (ORFs) and the average size of the ORF is 531bp in length. Among 85 ORFs, only 14 have been identified to share significant sequence similarities to the genes with known functions, while 28 are similar in sequence to the genes with function-unknown genes in the database and 43 ORFs are uniquely present in the phage AB1 genome. Fourteen function-assigned genes with putative functions include five phage structure proteins, an RNA polymerase, a big sub-unit and a small sub-unit of a terminase, a methylase and a recombinase and the proteins involved in DNA replication and so on. Multiple sequence alignment was conducted among those homologous proteins and the phylogenetic trees were reconstructed to analyze the evolutionary courses of these essential genes. From comparative genomics analysis, it turned out clearly that the frame of the phage genome mainly consisted of genes from Xanthomonas phages, Burkholderia ambifaria phages and Enterobacteria phages and while it comprises genes of its host A. baumannii only sporadically. The mosaic feature of the phage genome suggested that the horizontal gene transfer occurred among the phage genomes and between the phages and the host bacterium genomes. Analyzing the genome sequences of the phages should lay sound foundation to investigate how phages adapt to the environment and infect their hosts, and even help to facilitate the development of biological agents to deal with pathogenic bacteria.  相似文献   

15.
Hu  Xu  Reddy  A.S.N. 《Plant molecular biology》1997,34(6):949-959
Pathogenesis-related (PR)-5 proteins are a family of proteins that are induced by different phytopathogens in many plants and share significant sequence similarity with thaumatin. We isolated a complementary DNA (ATLP-3) encoding a PR5-like protein from Arabidopsis which is distinct from two other previously reported PR5 cDNAs from the same plant species. The predicted ATLP-3 protein with its amino-terminal signal sequence is 245 amino acids in length and is acidic with a pI of 4.8. The deduced amino acid sequence of ATLP-3 shows significant sequence similarity with PR5 and thaumatin-like proteins from Arabidopsis and other plants and contains a putative signal sequence at the amino-terminus. The expression of ATLP-3 and a related gene (ATLP-1) that we previously isolated from Arabidopsis was induced by pathogen infection and salicylic acid, a known inducer of pathogenesis-related genes. Southern blot analysis indicates that the ATLP-1 and ATLP-3 are coded by single-copy genes. To study the effect of ATLP-1 and ATLP-3 proteins on fungal growth, the cDNA regions corresponding to putative mature protein were expressed in Escherichia coli and the cDNA encoded proteins were purified. ATLP-1 and ATLP-3 proteins cross-reacted with anti-osmotin and anti-zeamatin antibodies. ATLP-3 protein showed antifungal activity against several fungal pathogens suggesting that ATLP-3 may be involved in plant defense against fungal pathogens.  相似文献   

16.
17.
The characterization of proteins secreted by Cryptococcus neoformans is of relevance to the identification of vaccine candidates, because concentrated supernatants from the fungus have been shown to be immunoprotective in previous studies. After fractionation of supernatants by anion exchange chromatography and preparative electrophoresis, we obtained the N-terminal amino acid sequences of 13 major proteins. Using a C. neoformans nucleotide database, we were able to clone and sequence the ORFs coding for 12 of these proteins. Some of the genes are identical to previously described ones, while six encode novel proteins, including four putative mannoproteins. The molecular characterization of these and other secreted products may provide useful information in the development of immune-based strategies to control cryptococcosis.  相似文献   

18.
Riemerella anatipestifer is the causative agent of polyserositis of ducks and geese. We have previously reported that a 3.9-kb plasmid, pCFC1, carries protein genes (vapD1 and vapD2) that are similar to virulence-associated genes of other bacteria. In the present study, we report the complete sequence of a second plasmid of 5.6 kb, pCFC2. pCFC2 has a 28% G-C content and three large open reading frames (ORFs). One of the ORFs (designated asVapD1) encodes a polypeptide that shares 53.9, 53.9, 48.3, 48.3 and 46.1% identity with virulence-associated proteins of Dichelobacter nodosus, Actinobacillus actinomycetemcomitans, Neisseria gonorrhoeae, Helicobacter pylori and Haemophilus influenzae, respectively. The second ORF encodes a putative DNA replication protein (RepA3) with 309 amino acids and a molecular mass of approximately 36 kDa. A novel insertion sequence (IS) element, designated ISRa1, was found on the plasmid pCFC2. ISRa1 was flanked by 15-bp imperfect inverted repeats (only one mismatched nucleotide). ISRa1 contained an ORF encoding a putative transposase of 292 amino acids. Southern blot analysis indicated that in R. anatipestifer strains examined, ISRa1 was present with 2-20 copies (at least). ISRa1 displayed a sequence approximately 35% homologous to the putative IS982 and RSBst-alpha from Lactococcus lactis ssp. cremoris SK11 and Bacillus stearothermophilus CU21. Three hybridization patterns of genomic DNA of eight R. anatipestifer strains with an ISRa1 probe indicated that ISRa1 might be a useful tool for epidemiological studies.  相似文献   

19.
Complete DNA sequence of the rat cytomegalovirus genome   总被引:7,自引:0,他引:7       下载免费PDF全文
We have determined the complete genome sequence of the Maastricht strain of rat cytomegalovirus (RCMV). The RCMV genome has a length of 229,896 bp and is arranged as a single unique sequence flanked by 504-bp terminal direct repeats. RCMV was found to have counterparts of all but one of the open reading frames (ORFs) that are conserved between murine CMV (MCMV) and human CMV (HCMV). Like HCMV, RCMV lacks homologs of the genes belonging to the MCMV m02 glycoprotein gene family. However, RCMV contains 15 ORFs with homology to members of the MCMV m145 glycoprotein gene family. Four ORFs are predicted to encode homologs of host proteins; R33 and R78 both putatively encode G protein-coupled receptors, whereas r144 and r131 encode homologs of major histocompatibility class I heavy chains and CC chemokines, respectively. An intriguing feature of the RCMV genome is the presence of an ORF, r127, with similarity to the rep gene of parvoviruses as well as ORF U94 of human herpesvirus 6A (HHV-6A) and HHV-6B. Counterparts of these ORFs have not been found in the other sequenced herpesviruses.  相似文献   

20.
We report here the complete genomic sequence of the Chilean human isolate of Andes virus CHI-7913. The S, M, and L genome segment sequences of this isolate are 1,802, 3,641 and 6,466 bases in length, with an overall GC content of 38.7%. These genome segments code for a nucleocapsid protein of 428 amino acids, a glycoprotein precursor protein of 1,138 amino acids and a RNA-dependent RNA polymerase of 2,152 amino acids. In addition, the genome also has other ORFs coding for putative proteins of 34 to 103 amino acids. The encoded proteins have greater than 98% overall similarity with the proteins of Andes virus isolates AH-1 and Chile R123. Among other sequenced Hantavirus, CHI-7913 is more closely related to Sin Nombre virus, with an overall protein similarity of 92%. The characteristics of the encoded proteins of this isolate, such as hydrophobic domains, glycosylation sites, and conserved amino acid motifs shared with other Hantavirus and other members of the Bunyaviridae family, are identified and discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号