首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Summary Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either -helical or -sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the downstream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units.Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.  相似文献   

2.
S. OHNO 《Animal genetics》1988,19(4):305-316
Inasmuch as all events in this universe are governed by multitudes of periodicities, it is a mistake to regard any coding sequence as unique implying the descent from random assemblages of four bases. Instead, each coding sequence is comprised of primordial and derived repeating units. In the case of families of proteins with transmembrane alpha-helices, the primordial repeating units of their coding sequences were base heptamers, thus, giving the heptapeptidic periodicity very conductive to alpha-helix formation to the original polypeptide chains. Even in modern coding sequences for these families of proteins, intact and base-substituted copies of these primordial heptamers are found in more or less even distribution along the entire coding sequence. In addition, there are now locally prominent tandemly recurring units that are only remotely related to primordial heptamers. In the case of Ca++ channel, local prominence of one such nonameric unit gave a unique tripeptidic periodicity to the fourth helix of each unit giving to it a girdle of positively charged residues. All these complex interplays between primordial and derived recurring units that characterize each coding sequence can best be appreciated by their musical transformation. The transformed musical score of a pertinent part of rabbit skeletal muscle Ca++ channel coding sequence is given.  相似文献   

3.
One third of a collection of cloned Stylonychia pustulata micronuclear DNA PstI fragments were found to be of a similar size, consistent with their being members of a repetitious sequence family with a repeat size of about 160 base pairs. Cross-hybridization experiments confirmed that these small cloned fragments are related by sequence homology. Hybridization of the cloned repetitious sequences to PstI digested micronuclear DNA revealed a “ladder” of bands (step size = 160 base pairs), indicating that the repeats are found in tandem arrays. This is the first demonstration of highly repetitious, tandemly repeated sequences in a ciliated protozoan.  相似文献   

4.
We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists.  相似文献   

5.
X-ray fiber diffraction studies of satellite DNAs from Gecarcinus lateralis, Drosophila virilis and Mus musculus, all of which have highly repetitious base sequences but with different degrees of sequence complexity, reveal only classical polynucleotide duplex structures in contrast to some highly repetitious synthetic DNAs.  相似文献   

6.
The observed frequency of folded rings has been determined as a function of fragment length and degree of resection for DNA from mouse and Necturus. The thermal stability of the ring closure and the kinetics of ring formation have been studied. As seen in the case of Drosophila DNA, mouse and Necturus DNA display a decreasing frequency of folded rings as fragment length increases. We interpret this to mean that repetitious sequences of a given type are clustered into many thousands of characteristic regions, called g-regions. The present paper focuses on the interior organization of g-regions. Variations of two competing models may be entertained: “tandem repetition” and “intermittent repetition”. If the g-regions were composed of exact, tandemly-repeating sequences, all observations can be easily explained. In order to maintain the idea that the g-regions contain repetitious blocks located at regular, or irregular intervals, one must suppose that such repetitious blocks are long (>200 nucleotide pairs), not internally repetitious, and represent perhaps 80% of the nucleotides in the g-region. Such a sequence can be thought of as a fractional-tandem repeat. For example: HIJXXXABC … HIJXXXABC … HIJXXX, where the X's stand for nucleotides composing sequences that are unrelated to each other, and the letters (ABC … HIJ) represent nucleotides in the non-internally-repetitive repeating sequence. We feel that debate cart now be profitably devoted to the question of whether approximately 80 or 100% of the tandemly-repetitious unit is in fact tandem.  相似文献   

7.
8.
The ciliated protozoa exhibit nuclear dimorphism. The genome of the somatic macronucleus arises from the germ-line genome of the micronucleus following conjugation. We have studied the fates of highly repetitious sequences in this process. Two cloned, tandemly repeated sequences from the micronucleus of Oxytricha fallax were used as probes in hybridizations to micronuclear and macronuclear DNA. The results of these experiments show: (1) the cloned repeats are members of two apparently unrelated repetitious sequence families, which each appear to comprise a few percent of the micronuclear genome, and (2) the amount of either family in the macronuclei from which our DNA was prepared is about 1/15 that found in an equal number of diploid micronuclei. Most, if not all, of the apparent macronuclear copies of these repeats can be accounted for by micronuclear contamination, which strongly suggests that these sequences are eliminated from the macronuclei and have no vegetiative function.  相似文献   

9.
DNA fragments partially digested by a 3′- or 5′-specific nuclease to produce single chain ends of opposite polarity will form a ring if the ends contain complementary sequences and are allowed to anneal. The frequency of rings can then be used as an assay to determine where and how identical repetitious sequences are arranged in the DNA. Thomas et al. (1973b) showed that all eucaryote chromosomes studied contain similar if not identical repetitious sequences clustered into regions called g-regions. To account for the observed ring frequency under different experimental conditions Thomas, Zimm &; Dancis (1973c) derived equations for two possible models of g-region organization. In the pure tandem model, the repetitious sequences are contiguous and occupy the entire g-region. In the intermittent repetition model, the repetitious sequences are simple copolymers and are irregularly arranged among non-repetitious sequences which are heterogenous in length. In the present paper, the results of Thomas et al. (1973c) are extended to cover the fractional tandem model. In this model, adjacent repetitious sequences are separated by non-repetitious sequences of uniform length. In addition, the equations for both the pure tandem and intermittent repetition models are shown to be special cases of the fractional tandem model but not vice versa.The capabilities and limitations of an analysis of ring formation are demonstrated using data from Drosophila. Although it is not possible to rule out any of the three models, the analysis can limit the ranges of the parameters describing each of the models that are consistent with the data. Previous conclusions that the data could only be explained by a pure tandem model which lacks any intervening unique sequences (Bick, Huang &; Thomas, 1973; Thomas et al., 1973b), are shown to be incorrect, in part because the equations for the fractional tandem model had not then been derived. Thus ring theory equations can be used to show the presence of clusters of similar if not identical sequences from ring-forming experiments, but they may not be able to determine the exact spacing and arrangement of these sequences within the clusters.  相似文献   

10.
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.  相似文献   

11.
Yeramian E 《Gene》2000,255(2):151-168
A gene identification procedure is formulated, based on large-scale structural analyses of genomic sequences. The structural property is the physical - thermal - stability of the DNA double-helix, as described by the classical helix-coil model. The analyses are detailed for the Plasmodium falciparum genome, which represents one of the most difficult cases for the gene identification problem (notably because of the extreme AT-richness of the genome). In this genome, the coding domains (either uninterrupted genes or exons in split genes) are accurately identified as regions of high thermal stability. The conclusion is based on the study of the available cloned genes, of which 17 examples are described in detail. These examples demonstrate that the physical criterion is valid for the detection of coding regions whose lengths extend from a few base pairs up to several thousand base pairs. Accordingly, the structural analyses can provide a powerful and convenient tool for the identification of complex genes in the P. falciparum genome. The limits of such a scheme are discussed. The gene identification procedure is applied to the completely sequenced chromosomes (2 and 3), and the results are compared with the database annotations. The structural analyses suggest more or less extensive revision to the annotations, and also allow new putative genes to be identified in the chromosome sequences. Several examples of such new genes are described in detail.  相似文献   

12.
13.
Molecular aspects of chromatin elimination in Ascaris lumbricoides   总被引:5,自引:0,他引:5  
DNA from spermatids, 4-cell stages, and larvae of Ascaris lumbricoides was isolated, and the genome size before and after chromatin elimination was determined by isotope dilution. According to these determinations, 27% of the DNA is lost during the process of chromatin elimination. This value is based on the assumption that larval nuclei are diploid. The genomes were then characterized by CsCl equilibrium density gradient centrifugation and renaturation kinetics. The eliminated DNA does not differ from the retained DNA in base composition. About 26% of the DNA of 4-cell stage embryos sediments as a light satellite and was shown to be mitochondrial DNA by electron microscopy. Renaturation kinetics revealed that 10% of the retained somatic DNA is repetitious with an average family size of 5500 to 7000 copies, whereas 90% of the retained DNA is presumably composed of unique sequences. By contrast, germ-line DNA contains 23% fast renaturing DNA with a family size of 7000 to 10,000 copies. Thus, eliminated DNA consists of repetitious and unique sequences in a ratio of about 1:1.  相似文献   

14.
Both genetic and musical sequences are ordered structures composed of combinations of a small number of elements, of nucleotides and musical notes. In the case of the genome, the emergence of cellular functions makes the order meaningful; in the case of musical sequences, the consequence of order is the production of mysterious esthetical effects in the human mind. Can any musical significance be found in DNA sequence? In this work, we present the technique used to convert DNA sequences into musical sequences. The musical equivalent of the sequence of a number of genes, either of fungal origin, such as Candida albicans or Sacharomyces cerevisiae (SLT2), or belonging to the human genome (genes involved in Alzheimer syndrome, blindness, and deafness such as Connexine 26 gene) has been obtained. Non-coding sequences are also important in life and music. The non-coding alphoid sequence has also been translated into a musical sequence, in this case using Fibonacci golden number basic series as structural helper. The elementary musical sequence derived from DNA sequence has served as an imposing frame in which rhythms, sounds, and melodies have been harmonically inserted. The Genoma Music Project is essentially a creative metaphor of the basic unity between the human mind and the natural ordered structure of life.  相似文献   

15.
Restriction endonuclease cleavage of satellite DNA in intact bovine nuclei   总被引:1,自引:0,他引:1  
Lolya Lipchitz  Richard Axel 《Cell》1976,9(2):355-364
We have analyzed the efficiency with which specific nucleotide sequences within nucleosomes are recognized and cleaved by DNA restriction endonucleases. A system amenable to this sort of analysis is the cleavage of the bovine genome with the restriction endonuclease EcoRI. Bovine satellite I comprises 7% of the genome and is tandemly repetitious with an EcoRI site at 1400 base pair (bp) intervals within this sequence. The ease with which this restriction fragment can be measured permits an analysis of the accessibility of this sequence when organized in a nucleosomal array.Initial studies indicated that satellite I sequences are organized in a nucleosomal structure in a manner analogous to that observed for total genomic DNA. We then examined the accessibility of the EcoRI cleavage sites in satellite to endonucleolytic cleavage in intact nuclei. We find that whereas virtually all the satellite I sequences from naked DNA are cleaved into discrete 1400 bp fragments, only 33% of the satellite I DNA is liberated as this fragment from intact nuclei. These data indicate that 57% of the EcoRI sites in nuclei are accessible to cleavage and that cleavage can occur within the core of at least half the nucleosomal subunits. Analysis of the products of digestion suggests a random distribution of nucleosomes about the EcoRI sites of satellite I DNA.Finally, the observation that satellite sequences can be cleaved from nuclei to 1400 bp length fragments with their associated proteins provides a method for the isolation of specific sequences as chromatin. Using sucrose gradient velocity centrifugation, we have isolated a 70% pure fraction of satellite I chromatin. Nuclease digestion of this chromatin fraction reveals the presence of nucleosomal subunits and indicates that specific sequences can be isolated in this manner without gross disorganization of their subunit structure.  相似文献   

16.
The 2694 ORFs originally annotated as potential genes in the genome of Aeropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are coding ones, while those assigned to clusters B and C are non-coding ORFs. A "codingness" index called the AZ score is defined based on a clustering method used to recognize protein-coding genes in the A. pernix genome. The criterion for a coding or non-coding ORF is based on the AZ score. ORFs with AZ > 0 or AZ < 0 are coding or non-coding, respectively. Consequently, 620 out of 632 ORFs with putative functions based on the original annotation are contained in cluster A, which have positive AZ scores. In addition, all 29 ORFs encoding putative or conserved proteins newly added in RefSeq annotation also have positive AZ scores. Accordingly, the number of re-recognized protein-coding genes in the A. pernix genome is 1610, which is significantly less than 2694 in the original annotation and also much less than 1841 in the RefSeq annotation curated by NCBI staff. Annotation information of re-recognized genes and their AZ scores are available at: http://tubic.tju.edu.cn/Aper/.  相似文献   

17.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

18.
Single-stranded DNA (ssDNA) isolated from (and amounting to 1.5-2% of) native nuclear DNA of cultured embryonic chicken cells labelled 1-2 days with 3H-thymidine was analyzed by self-hybridization, hydroxyapatite chromatography (HAC) partial digestion with S1 nuclease, isopycnic centrifugation. Two main fractions were rehybridized to excess amounts of bulk nuclear DNA or total cytoplasmic RNAs. The major fraction, equivalent to 75% of total ssDNA, consists of unique DNA sequences, apparently derived from multiple coding regions of the cell genome, since they are not self-reassociating but are hybridizable to the non repetitious portion of bulk nuclear DNA and 40-45% of them are complementary to cell RNAs. About half of these ssDNA sequences hybridizable to cell RNAs seem to be closely connected with molecules belonging to the minor ssDNA fraction. The latter fraction consists of self-reassociating, moderately repeated DNA sequences, mainly derived from non coding regions of the cell genome. These findings are discussed in the light of others, showing interspersion of coding and non coding DNA sequences and susceptibility of active genes to certain nucleasic attacks.  相似文献   

19.
Mechanistic constraints on diversity in human V(D)J recombination.   总被引:12,自引:1,他引:11       下载免费PDF全文
We have analyzed a large collection of coding junctions generated in human cells. From this analysis, we infer the following about nucleotide processing at coding joints in human cells. First, the pattern of nucleotide loss from coding ends is influenced by the base composition of the coding end sequences. AT-rich sequences suffer greater loss than do GC-rich sequences. Second, inverted repeats can occur at ends that have undergone nucleolytic processing. Previously, inverted repeats (P nucleotides) have been noted only at coding ends that have not undergone nucleolytic processing, this observation being the basis for a model in which a hairpin intermediate is formed at the coding ends early in the reaction. Here, inverted repeats at processed coding ends were present at approximately twice the number of junctions as P nucleotide additions. Terminal deoxynucleotidyl transferase (TdT) is required for the appearance of the inverted repeats at processed ends (but not full-length coding ends), yet statistical analysis shows that it is virtually impossible for the inverted repeats to be polymerized by TdT. Third, TdT additions are not random. It has long been noted that TdT has a G utilization preference. In addition to the G preference, we find that TdT adds strings of purines or strings of pyrimidines at a highly significant frequency. This tendency suggests that nucleotide-stacking interactions affect TdT polymerization. All three of these features place constraints on the extent of junctional diversity in human V(D)J recombination.  相似文献   

20.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号