首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Learning processes are applied to the recognition of protein coding regions in prokaryotes. Non-contradictory, statistical rules are deduced from a set of known examples of coding regions. These rules allow us to build characteristic patterns on the m-RNA upstream the initiating codon. These rules are applied to recognize more than 180 coding sequences.  相似文献   

2.
Two independent methods are used to evaluate the protein-coding information content in different classes of DNA sequences. The first method allows to evaluate the statistical relevance of finding unidentified reading frames, longer than 100 codons, on both DNA strands of: a) 117 DNA sequences that code for 142 nuclear proteins; b) 39 stable RNA coding sequences and c) 36 other DNA sequences which include regulatory and as yet unknown function sequences. The finding of 50 reading frames longer than 100 codons (complementary inverted proteins or c.i.p. genes) located on the DNA strand complementary to the protein-coding one is drastically in excess of the number predicted by chance alone. An independent method (testcode) applied to c.i.p. gene sequences, which assigns the probability of coding to a given sequence, predicts that more than 50% of these genes are translated in a functional product. These analyses indicate the existence of a new class of protein-coding genes, located on the DNA sequences complementary to the protein-coding DNA strand.  相似文献   

3.
H Grosjean  W Fiers 《Gene》1982,18(3):199-209
By considering the nucleotide sequence of several highly expressed coding regions in bacteriophage MS2 and mRNAs from Escherichia coli, it is possible to deduce some rules which govern the selection of the most appropriate synonymous codons NNU or NNC read by tRNAs having GNN, QNN or INN as anticodon. The rules fit with the general hypothesis that an efficient in-phase translation is facilitated by proper choice of degenerate codewords promoting a codon-anticodon interaction with intermediate strength (optimal energy) over those with very strong or very weak interaction energy. Moreover, codons corresponding to minor tRNAs are clearly avoided in these efficiently expressed genes. These correlations are clearcut in the normal reading frame but not in the corresponding frameshift sequences +1 and +2. We hypothesize that both the optimization of codon-anticodon interaction energy and the adaptation of the population to codon frequency or vice versa in highly expressed mRNAs of E. coli are part of a strategy that optimizes the efficiency of translation. Conversely, codon usage in weakly expressed genes such as repressor genes follows exactly the opposite rules. It may be concluded that, in addition to the need for coding an amino acid sequence, the energetic consideration for codon-anticodon pairing, as well as the adaptation of codons to the tRNA population, may have been important evolutionary constraints on the selection of the optimal nucleotide sequence.  相似文献   

4.
While veritable oceans of ink have been spilled over the base distributions within genes, the literature is virtually silent on large scale intra genomic base distribution. To address this issue, we have examined approximately 3400 chromosomal sequences from approximately 2000 entire genomes-including DNA and RNA, single- and double-stranded, coding and non-coding genomes. For each sequence the mean, variance, skewness, and kurtosis for each base were computed along with the genome base composition. The main findings are: (1) there is no simple relationship between these statistics and the base composition of the genome, (2) in non-viral genomes, base distribution is non-uniform, (3) base distribution in non-eukaryotic genomes obeys a number of simple rules, (4) these rules are not dependent on the presence of coding sequences, (5) bacterial genomes in particular are unusually compliant with these rules, and (6) eukaryotes have a unique pattern of base distribution.  相似文献   

5.
Application of learning techniques to splicing site recognition   总被引:2,自引:0,他引:2  
J Quinqueton  J Moreau 《Biochimie》1985,67(5):541-547
Most genes of eukaryotic genomes are disrupted by introns. The application of a learning technique which uses both statistic and syntactic analysis lead to the establishment of logical rules enabling the recognition of intron/exon junctions between uncoding and coding sequences. The rules were tested on rat actin gene sequences containing some or all of the introns and 50 exon nucleotides on either side of the intron. The results show good recognition of the excision site. This recognition is more ambiguous when the sequence is short; for the acceptor sequence it presents a good selection. The learning achieved with both the donor and acceptor sequence does not lead to recognition. This result indicates that it is not the relationship between donor and acceptor sites in the same intron which determines sequence selection or the splicing mechanism.  相似文献   

6.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

7.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

8.
Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value < or = 10(-30)) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5' to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5' noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks.  相似文献   

9.
10.
We report the isolation and sequencing of genomic copies of mariner transposons involved in recent horizontal transfers into the genomes of the European earwig, Forficula auricularia; the European honey bee, Apis mellifera; the Mediterranean fruit fly, Ceratitis capitata; and a blister beetle, Epicauta funebris, insects from four different orders. These elements are in the mellifera subfamily and are the second documented example of full-length mariner elements involved in this kind of phenomenon. We applied maximum likelihood methods to the coding sequences and determined that the copies in each genome were evolving neutrally, whereas reconstructed ancestral coding sequences appeared to be under selection, which strengthens our previous hypothesis that the primary selective constraint on mariner sequence evolution is the act of horizontal transfer between genomes.  相似文献   

11.
The isochore structure of the nuclear genome of angiosperms described by Salinas et al. (1) was confirmed by using a different experimental approach, namely by showing that the levels of coding sequences from both dicots and Gramineae are linearly correlated with GC levels of the corresponding flanking sequences. The compositional distribution of homologous coding sequences from several orders of dicots and from Gramineae were also studied and shown to mimick the compositional distributions previously seen (1) for coding sequences in general, most coding sequences from Gramineae being much higher than those of the dicots explored. These differences were even stronger for third codon positions and led to striking codon usages for many coding sequences especially in the case of Gramineae.  相似文献   

12.
The Notch locus of Drosophila melanogaster   总被引:48,自引:0,他引:48  
S Kidd  T J Lockett  M W Young 《Cell》1983,34(2):421-433
  相似文献   

13.
14.
There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.  相似文献   

15.
Single-stranded DNA (ssDNA) isolated from (and amounting to 1.5-2% of) native nuclear DNA of cultured embryonic chicken cells labelled 1-2 days with 3H-thymidine was analyzed by self-hybridization, hydroxyapatite chromatography (HAC) partial digestion with S1 nuclease, isopycnic centrifugation. Two main fractions were rehybridized to excess amounts of bulk nuclear DNA or total cytoplasmic RNAs. The major fraction, equivalent to 75% of total ssDNA, consists of unique DNA sequences, apparently derived from multiple coding regions of the cell genome, since they are not self-reassociating but are hybridizable to the non repetitious portion of bulk nuclear DNA and 40-45% of them are complementary to cell RNAs. About half of these ssDNA sequences hybridizable to cell RNAs seem to be closely connected with molecules belonging to the minor ssDNA fraction. The latter fraction consists of self-reassociating, moderately repeated DNA sequences, mainly derived from non coding regions of the cell genome. These findings are discussed in the light of others, showing interspersion of coding and non coding DNA sequences and susceptibility of active genes to certain nucleasic attacks.  相似文献   

16.
在DNA全对称群基础上,首先给出了正四面体所有对称操作与碱基变换群元素之间的一一对应关系;然后,归纳出了判断水或亲水密码子的对称原则;最后,讨论了多义密码子序列的对称性。  相似文献   

17.
We have analyzed micrococcal nuclease (MNase) DNA cleavage patterns at the sequence level by examining 2.3 X 10(3) base-pairs of data derived from the Drosophila melanogaster 44D larval cuticle locus. Within this region, MNase preferentially cleaved 140 sites. Clusters of these sites appear to generate the preferential MNase eukaryotic DNA cleavage sites seen on agarose gels at roughly 100 to 300 base-pair intervals. These clusters of preferential cleavage sites rarely occur within gene coding regions. The analysis revealed that duplex DNA sequences preferentially cleaved by MNase are generally determined by a single strand sequence: d(A-T)n, where n greater than or equal to 1, flanked by a 5' dC or dG. Cleavage of the other strand is generally staggered 5' by several nucleotides and occurs even if such sequences are absent on that strand. An empirical predictive DNA cleavage model derived from a statistical analysis of the sequence level data was applied to seven eukaryotic gene loci of known sequence. The predicted patterns were in good general agreement with the previously observed eukaryotic gene/spacer cleavage pattern. Statistical analysis also revealed that sites of predicted preferential DNA cleavage occur less frequently in protein coding regions than for randomized sequences of the same length and nucleotide content. Comparison of the MNase cleavage patterns to the sequence-dependent pattern of binding energies between duplex DNA strands indicates that MNase preferentially cleaves sequences with low helix stability.  相似文献   

18.
M Sollazzo  R Frank  G Cesareni 《Gene》1985,37(1-3):199-206
We show that the fusion between regulatory sequences present on expression vectors and coding sequences can be efficiently achieved by oligonucleotide-directed mutagenesis. We have constructed single-stranded (ss) expression vectors that facilitate this process. These plasmids derive from vectors that have been used for the synthesis of quantities of proteins in Escherichia coli or RNAs in vitro. By inserting the origin of replication of the ss phage f1 into these plasmids it became possible to package their ss DNA into phage rods. Deletion of unwanted sequences or simple base changes can then be obtained by oligonucleotide-directed mutagenesis using the vector ss DNA as a template. We discuss the results of several experiments where this technique was applied to our expression vectors and we demonstrate the construction of a plasmid which efficiently synthesizes in vitro a regulatory RNA molecule that is involved in the control of plasmid copy number.  相似文献   

19.
20.
The melting of the coding and non-coding classes of natural DNA sequences was investigated using a program, MELTSIM, which simulates DNA melting based upon an empirically parameterized nearest neighbor thermodynamic model. We calculated T(m) results of 8144 natural sequences from 28 eukaryotic organisms of varying F(GC) (mole fraction of G and C) and of 3775 coding and 3297 non-coding sequences derived from those natural sequences. These data demonstrated that the T(m) vs. F(GC) relationships in coding and non-coding DNAs are both linear but have a statistically significant difference (6.6%) in their slopes. These relationships are significantly different from the T(m) vs. F(GC) relationship embodied in the classical Marmur-Schildkraut-Doty (MSD) equation for the intact long natural sequences. By analyzing the simulation results from various base shufflings of the original DNAs and the average nearest neighbor frequencies of those natural sequences across the F(GC) range, we showed that these differences in the T(m) vs. F(GC) relationships are largely a direct result of systematic F(GC)-dependent biases in nearest neighbor frequencies for those two different DNA classes. Those differences in the T(m) vs. F(GC) relationships and biases in nearest neighbor frequencies also appear between the sequences from multicellular and unicellular organisms in the same coding or non-coding classes, albeit of smaller but significant magnitudes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号