首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Analysis of the frequencies of occurrence of mono- and dinucleotides in sequenced E. coli DNA fragments was performed. The DNA sequences of total length 135 000 nucleotides were considered. It was demonstrated that the fragments of DNA which have different functional properties also have different parameters of neighbour nucleotides correlation. Moreover, periodical positional dependence of correlation parameters in coding regions was found. The evolution significance of stated observation is discussed, so as the opportunity of using them in the special model of nucleotide's sequences, which is needed for development of the computer recognition algorithms for genomic functional units.  相似文献   

2.
A hidden Markov model that finds genes in E. coli DNA.   总被引:12,自引:1,他引:11       下载免费PDF全文
A Krogh  I S Mian    D Haussler 《Nucleic acids research》1994,22(22):4768-4778
A hidden Markov model (HMM) has been developed to find protein coding genes in E. coli DNA using E. coli genome DNA sequence from the EcoSeq6 database maintained by Kenn Rudd. This HMM includes states that model the codons and their frequencies in E. coli genes, as well as the patterns found in the intergenic region, including repetitive extragenic palindromic sequences and the Shine-Delgarno motif. To account for potential sequencing errors and or frameshifts in raw genomic DNA sequence, it allows for the (very unlikely) possibility of insertions and deletions of individual nucleotides within a codon. The parameters of the HMM are estimated using approximately one million nucleotides of annotated DNA in EcoSeq6 and the model tested on a disjoint set of contigs containing about 325,000 nucleotides. The HMM finds the exact locations of about 80% of the known E. coli genes, and approximate locations for about 10%. It also finds several potentially new genes, and locates several places were insertion or deletion errors/and or frameshifts may be present in the contigs.  相似文献   

3.
Dynamic flexibility in the Escherichia coli genome.   总被引:2,自引:0,他引:2  
L Tsai  Z Sun 《FEBS letters》2001,507(2):225-230
Empirical rules based on tetranucleotide parameters were presented to predict the structural parameters twist (Omega), roll (rho), tilt (tau) and slide (D(y)). A statistical mechanical model was used to analyze the flexibility of the Escherichia coli genome. The replication terminus region displayed a low level of flexibility. A strong correlation can be seen between G+C content and flexibility. Average flexibilities in the coding regions were found to be significantly larger than those in non-coding regions. The flexible characteristics in the 5'-neighborhood of the coding regions and in three class sigma promoter sequences in the E. coli genome were also analyzed.  相似文献   

4.
The sequence of the 6S RNA gene of Pseudomonas aeruginosa.   总被引:1,自引:0,他引:1       下载免费PDF全文
From the gram-negative eubacterium Pseudomonas aeruginosa we have isolated a stable 6S RNA, approximately 180 nucleotides in length. The RNA was partially sequenced and identified by comparison with the known Escherichia coli 6S RNA sequence. Southern hybridizations revealed a single copy gene coding for the 6S RNA. DNA from other prokaryotes, i.e. E. coli, Thermus thermophilus, Bacillus subtilis, Bacillus stearothermophilus and Halobacterium maris mortui, did not give detectable hybridization signals. The 6S RNA gene was cloned in E. coli and its complete primary structure was determined. Although the 6S RNA sequences from P. aeruginosa and E. coli share only a 60.4% homology, we are able to propose a common secondary structural model.  相似文献   

5.
6.
Complete nucleotide sequence of the Escherichia coli recB gene.   总被引:21,自引:6,他引:15       下载免费PDF全文
The complete nucleotide sequence of the Escherichia coli recB gene which encodes a subunit of the ATP-dependent DNase, Exonuclease V, has been determined. The proposed coding region for the RecB protein is 3543 nucleotides long and would encode a polypeptide of 1180 amino acids with a calculated molecular weight of 133,973. The start of the recB coding sequence overlaps the 3' end of the upstream ptr gene, and the recB termination codon overlaps the initiation codon of the downstream recD gene, suggesting that these genes may form an operon. No sequences which reasonably fit the consensus for an E. coli promoter could be identified upstream of the proposed recB translational start. The predicted RecB amino acid sequence contains regions of homology with ATPases, DNA binding proteins and DNA repair enzymes.  相似文献   

7.
8.
The mitochondrial gene coding for the large ribosomal RNA (21S) has been isolated from a rho- clone of Saccharomyces cerevisiae. A DNA segment of about 5500 base pairs has been sequenced which included the totality of the sequence coding for the mature ribosomal RNA and the intron. The mature RNA sequence corresponds to a length of 3273 nucleotides. Despite the very low guanine-cytosine content (20.5%), many stretches of sequence are homologous to the corresponding Escherichia coli 23S ribosomal RNA. The sequence can be folded into a secondary structure according to the general models for prokaryotic and eukaryotic large ribosomal RNAs. Like the E.coli gene, the mitochondrial gene contains the sequences that look like the eukaryotic 5.8S and the chloroplastic 4.5S ribosomal RNAs. The 5' and 3' end regions show a complementarity over fourteen nucleotides.  相似文献   

9.
10.
11.
The complete nucleotide sequence of the 16S RNA from Proteus vulgaris has been determined. The molecule (1544 nucleotides) shows 93% homology with the sequence of E. coli 16S RNA. Six methylated nucleotides have been localized in positions homologous to those observed in the E. coli RNA molecule. Both E. coli and P. vulgaris 16S RNA chains can be folded up into a common secondary structure scheme. Comparative sequence analysis of the two molecules has provided a valuable contribution to 16S RNA secondary structure model building.  相似文献   

12.
Bacteriophage T7's gene 0.3, coding for an antirestriction protein, possesses one of the strongest translation initiation regions (TIR) in E. coli. It was isolated on DNA fragments of differing length and cloned upstream of the mouse dihydrofolate reductase gene in an expression vector to control the translation of this gene's sequence. The TIR's efficiency was highly dependent on nucleotides +15 to +26 downstream of the gene's AUG. This sequence is complementary to nucleotides 1471-1482 of the 16srRNA. Similar sequences complementary to this rRNA region are present in other efficient TIRs of the E. coli genome and those of its bacteriophages. There seems to be a correlation between this sequence homology and the efficiency of the initiation signals. We propose that this region specifies a stimulatory interaction between the mRNA and 16srRNA besides the Shine-Dalgarno interaction during the translation initiation step.  相似文献   

13.
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.  相似文献   

14.
15.
16.
The argF gene encoding ornithine carbamoyl-transferase (OTCase; EC2.1.3.3) has been cloned from Corynebacterium glutamicum by transforming the Escherichia coli arginine auxotroph with the genomic DNA library. The cloned DNA also complements the E. coli argG mutant, suggesting a clustered organization of the genes in the genome. We have determined the DNA sequence of the minimal fragment complementing the E. coli argF mutant. The coding region of the cloned gene is 957 nucleotides long with a deduced molecular mass of about 35 kDa polypeptide. The enzyme activity and size of the expressed protein in the E. coli auxotroph carrying the argF gene revealed that the cloned gene indeed codes for OTCase. Analysis of the amino acid sequence of the predicted protein revealed a strong similarity to the corresponding protein of other bacteria.  相似文献   

17.
A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.  相似文献   

18.
19.
The entropies of protein coding genes from Escherichia coli were calculated according to Boltzmann's formula. Entropies of the coding regions were compared to the entropies of noncoding or miscoding ones. With nucleotides as code units, the entropies of the coding regions, when compared to the entropies of complete sequences (leader and coding region as well as trailer), were seen to be lower but with a marginal statistical significance. With triplets of nucleotides as code units, the entropies of correct reading frames were significantly lower than the entropies of frameshifts +1 and -1. With amino acids as code units, the results were opposite: Biologically functional proteins had significantly higher entropies than proteins translated from the frameshifted sequences. We attempt to explain this paradox with the hypothesis that the genetic code may have the ability of lowering information content (increasing entropy) of proteins while translating them from DNA. This ability might be beneficial to bacteria because it would make the functional proteins more probable (having a higher entropy) than nonfunctional proteins translated from frameshifted sequences.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号