期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analysis of two large functionally uncharacterized regions in the <Emphasis Type="Italic">Methanopyrus kandleri</Emphasis> AV19 genome

Jensen LJ Skovgaard M Sicheritz-Pontén T Jørgensen MK Lundegaard C Pedersen CC Petersen N Ussery D 《BMC genomics》2003,4(1):12

Background

For most sequenced prokaryotic genomes, about a third of the protein coding genes annotated are "orphan proteins", that is, they lack homology to known proteins. These hypothetical genes are typically short and randomly scattered throughout the genome. This trend is seen for most of the bacterial and archaeal genomes published to date.

Results

In contrast we have found that a large fraction of the genes coding for such orphan proteins in the Methanopyrus kandleri AV19 genome occur within two large regions. These genes have no known homologs except from other M. kandleri genes. However, analysis of their lengths, codon usage, and Ribosomal Binding Site (RBS) sequences shows that they are most likely true protein coding genes and not random open reading frames.

Conclusions

Although these regions can be considered as candidates for massive lateral gene transfer, our bioinformatics analysis suggests that this is not the case. We predict many of the organism specific proteins to be transmembrane and belong to protein families that are non-randomly distributed between the regions. Consistent with this, we suggest that the two regions are most likely unrelated, and that they may be integrated plasmids.

相似文献

2.

Organization and structure of the Methanococcus transcriptional unit homologous to the Escherichia coli "spectinomycin operon". Implications for the evolutionary relationship of 70 S and 80 S ribosomes 总被引：10，自引：0，他引：10

J Auer G Spicker A B?ck 《Journal of molecular biology》1989,209(1):21-36

相似文献

3.

Gene organization and structure of two transcriptional units from Methanococcus coding for ribosomal proteins and elongation factors 总被引：5，自引：0，他引：5

J Auer K Lechner A B?ck 《Canadian journal of microbiology》1989,35(1):200-204

相似文献

4.

A frame-specific symmetry of complementary strands of DNA suggests the existence of genes on the antisense strand

Tetsuya Yomo Itaru Urabe 《Journal of molecular evolution》1994,38(2):113-120

The bacterial DNA sequence in GenBank database were divided into coding and noncoding regions and examined for the base-trimer distribution in every triplet frame on the sense and antisense strands. The results revealed that for the noncoding region, both strands have very similar base-trimer distributions and have no frame specificity; that is, DNA is symmetric in the noncoding region. For the coding region, on the other hand, the symmetry is broken only in the triplet framework, and we found a special triplet-frame-specific symmetry which appears when the two complementary strands of the coding region are read from their 5 ends. In addition, the following frame specificity was also observed in the distribution of stop codons on the antisense strand of the coding region. When the antisense sequences of the open reading frames (ORFs) in the database are read in the three reading frames, the same reading frame as the corresponding ORF contains a significantly larger amount of long open frames without stop codons (i.e., nonstop frames [NSFs]) than expected, while the number of NSFs in the other two reading frames is similar to that of the expected one. That is, NSFs as well as ORFs are maintained in a frame-specific manner, and in this sense, DNA becomes symmetrical even in the coding region. These two kinds of frame-specific symmetries indicate that only an ORF and its complementary triplets are specifically recognized and maintained in DNA. We suppose that the antisense strands as well as the sense strands in the coding region may be transcribed, thereby producing various kinds of proteins corresponding to NSFs, though their amount may not be large. The presence of these proteins should have some benefits for living organisms, and therefore we propose that these proteins are upcoming enzymes having novel functions.Correspondence to: I. Urabe 相似文献

5.

Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. 总被引：12，自引：2，他引：10

下载免费PDF全文

M Borodovsky K E Rudd E V Koonin 《Nucleic acids research》1994,22(22):4756-4767

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. 相似文献

6.

A first look at ARFome: dual-coding genes in mammalian genomes

Chung WY Wadhawan S Szklarczyk R Pond SK Nekrutenko A 《PLoS computational biology》2007,3(5):e91

相似文献

7.

Coding capacity of complementary DNA strands. 总被引：7，自引：4，他引：3

下载免费PDF全文

A Casino M Cipollaro A M Guerrini G Mastrocinque A Spena V Scarlato 《Nucleic acids research》1981,9(6):1499-1518

A Fortran computer algorithm has been used to analyze the nucleotide sequence of several structural genes. The analysis performed on both coding and complementary DNA strands shows that whereas open reading frames shorter than 100 codons are randomly distributed on both DNA strands, open reading frames longer than 100 codons ("virtual genes") are significantly more frequent on the complementary DNA strand than on the coding one. These "virtual genes" were further investigated by looking at intron sequences, splicing points, signal sequences and by analyzing gene mutations. On the basis of this analysis coding and complementary DNA strands of several eukaryotic structural genes cannot be distinguished. In particular we suggest that the complementary DNA strand of the human epsilon-globin gene might indeed code for a protein. 相似文献

8.

Chicken NFI/TGGCA proteins are encoded by at least three independent genes: NFI-A, NFI-B and NFI-C with homologues in mammalian genomes. 总被引：27，自引：8，他引：19

下载免费PDF全文

R A Rupp U Kruse G Multhaup U Gbel K Beyreuther A E Sippel 《Nucleic acids research》1990,18(9):2607-2616

相似文献

9.

Sequence, organization, transcription and evolution of RNA polymerase subunit genes from the archaebacterial extreme halophiles Halobacterium halobium and Halococcus morrhuae 总被引：30，自引：0，他引：30

H Leffers F Gropp F Lottspeich W Zillig R A Garrett 《Journal of molecular biology》1989,206(1):1-17

相似文献

10.

Thermus thermophilus bacteriophage phiYS40 genome and proteomic characterization of virions

Naryshkina T Liu J Florens L Swanson SK Pavlov AR Pavlova NV Inman R Minakhin L Kozyavkin SA Washburn M Mushegian A Severinov K 《Journal of molecular biology》2006,364(4):667-677

We determined the sequence of the 152,372 bp genome of phiYS40, a lytic tailed bacteriophage of Thermus thermophilus. The genome contains 170 putative open reading frames and three tRNA genes. Functions for 25% of phiYS40 gene products were predicted on the basis of similarity to proteins of known function from diverse phages and bacteria. phiYS40 encodes a cluster of proteins involved in nucleotide salvage, such as flavin-dependent thymidylate synthase, thymidylate kinase, ribonucleotide reductase, and deoxycytidylate deaminase, and in DNA replication, such as DNA primase, helicase, type A DNA polymerase, and predicted terminal protein involved in initiation of DNA synthesis. The structural genes of phiYS40, most of which have no similarity to sequences in public databases, were identified by mass spectrometric analysis of purified virions. Various phiYS40 proteins have different phylogenetic neighbors, including myovirus, podovirus, and siphovirus gene products, bacterial genes and, in one case, a dUTPase from a eukaryotic virus. phiYS40 has apparently arisen through multiple acts of recombination between different phage genomes as well as through acquisition of bacterial genes. 相似文献

11.

Purifying and directional selection in overlapping prokaryotic genes 总被引：4，自引：0，他引：4

Rogozin IB Spiridonov AN Sorokin AV Wolf YI Jordan IK Tatusov RL Koonin EV 《Trends in genetics : TIG》2002,18(5):228-232

In overlapping genes, the same DNA sequence codes for two proteins using different reading frames. Analysis of overlapping genes can help in understanding the mode of evolution of a coding region from noncoding DNA. We identified 71 pairs of convergent genes, with overlapping 3' ends longer than 15 nucleotides, that are conserved in at least two prokaryotic genomes. Among the overlap regions, we observed a statistically significant bias towards the 123:132 phase (i.e. the second codon base in one gene facing the degenerate third position in the second gene). This phase ensures the least mutual constraint on nonconservative amino acid replacements in both overlapping coding sequences. The excess of this phase is compatible with directional (positive) selection acting on the overlapping coding regions. This could be a general evolutionary mode for genes emerging from noncoding sequences, in which the protein sequence has not been subject to selection. 相似文献

12.

Nuclear protein factors binding with specific DNA sequences

K T Turpaev E S Vasetski? 《Genetika》1990,26(5):804-816

Primary structure of thousands of genes is being determined in many laboratories worldwide. While it is relatively easy to analyse the coding region(s) of genes, it is usually hard to understand what is located in non-coding regions. A non-coding region may contain very valuable information about the mode of functioning of a given gene, e. g. promoters, enhancers, silencers etc. The regulatory function of these sequences is determined by their interaction with certain sequence-specific proteins, i. e. the presence of a certain DNA sequence in a non-coding region of a gene may suggest that the gene is regulated by a specific protein factor. This minireview summarizes recent data on most known eukaryotic sequence-specific DNA-binding protein factors, including their origin, DNA consensus, and their role in expression of corresponding genes. 相似文献

13.

The nucleotide sequence and genome organization of the polyoma early region: extensive nucleotide and amino acid homology with SV40. 总被引：84，自引：0，他引：84

T Friedmann A Esty P LaPorte P Deininger 《Cell》1979,17(3):715-724

相似文献

14.

Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification 总被引：12，自引：0，他引：12

B. Rehm 《Applied microbiology and biotechnology》2001,57(5-6):579-592

The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed. 相似文献

15.

The genome of fowlpox virus 总被引：15，自引：0，他引：15

下载免费PDF全文

Afonso CL Tulman ER Lu Z Zsak L Kutish GF Rock DL 《Journal of virology》2000,74(8):3815-3831

相似文献

16.

Rational genomics I: antisense open reading frames and codon bias in short-chain oxido reductase enzymes and the evolution of the genetic code

Duax WL Huether R Pletnev VZ Langs D Addlagatta A Connare S Habegger L Gill J 《Proteins》2005,61(4):900-906

The short-chain oxidoreductase (SCOR) family of enzymes includes over 6000 members, extending from bacteria and archaea to humans. Nucleic acid sequence analysis reveals that significant numbers of these genes are remarkably free of stopcodons in reading frames other than the coding frame, including those on the antisense strand. The genes from this subset also use almost entirely the GC-rich half of the 64 codons. Analysis of a million hypothetical genes having random nucleotide composition shows that the percentage of SCOR genes having multiple open reading frames exceeds random by a factor of as much as 1 x 10(6). Nevertheless, screening the content of the SWISS-PROT TrEMBL database reveals that 15% of all genes contain multiple open reading frames. The SCOR genes having multiple open reading frames and a GC-rich coding bias exhibit a similar GC bias in the nucleotide triple composition of their DNA. This bias is not correlated with the GC content of the species in which the SCOR genes are found. One possible explanation for the conservation of multiple open reading frames and extreme bias in nucleic acid composition in the family of Rossman folds is that the primordial member of this family was encoded early using only very stable GC-rich DNA and that evolution proceeded with extremely limited introduction of any codons having two or more adenine or thymine nucleotides. These and other data suggest that the SCOR family of enzymes may even have diverged from a common ancestor before most of the AT-rich half of the genetic code was fully defined. 相似文献

17.

Comparative studies of ribosomal proteins and their genes from Methanococcus vannielii and other organisms 总被引：3，自引：0，他引：3

A K K?pke B Wittmann-Liebold 《Canadian journal of microbiology》1989,35(1):11-20

Using data from a partial protein sequence analysis of ribosomal proteins derived from the archaebacterium Methanococcus vannielii, oligonucleotide probes were synthesized. The probes enabled us to localize several ribosomal protein genes and to determine their nucleotide sequences. The amino acid sequences that were deduced from the genes correspond to proteins L12 and L10 from the rif operon, according to the genome organization in Escherichia coli, and to proteins L23 and L2, which have comparable locations, as in the Escherichia coli S10 operon. Various degrees of similarity were found when the four proteins were compared with the corresponding ribosomal proteins of prokaryotic or eukaryotic organisms. The highest sequence homology was found in counterparts from other archaebacteria, such as Halobacterium marismortui, Halobacterium halobium, or Sulfolobus. In general, the M. vannielii protein sequences were more related to the eukaryotic kingdom than to the Gram-positive or Gram-negative eubacteria. On the other hand, the organization of the ribosomal protein genes clearly follows the operon structure of the Escherichia coli genome and is different from the monocistronic eukaryotic gene arrangements. The protein coding regions were not interrupted by introns. Furthermore, the Shine-Dalgarno type sequences of methanogenic bacteria are homologous with those of eubacteria, and also their terminator regions are similar. 相似文献

18.

Positive regulators of opine-inducible promoters in the nopaline and octopine catabolism regions of Ti plasmids. 总被引：8，自引：0，他引：8

J von Lintig H Zanker J Schr?der 《Molecular plant-microbe interactions : MPMI》1991,4(4):370-378

相似文献

19.

Characterisation of the 11 Kb DNA region adjacent to the gene encoding Desulfovibrio gigas flavoredoxin.

Manuela Broco Ana Marques Solange Oliveira Claudina Rodrigues-Pousada 《DNA sequence》2005,16(3):207-216

相似文献

20.

Complete nucleotide sequence of ubiquitous plasmid pEA29 from Erwinia amylovora strain Ea88: gene organization and intraspecies variation

McGhee GC Jones AL 《Applied and environmental microbiology》2000,66(11):4897-4907

相似文献