首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A circular code is a set of trinucleotides allowing the reading frames in genes to be retrieved locally, i.e. anywhere in genes and in particular without start codons, and automatically with a window of few nucleotides. In 1996, a common circular code, called X, was identified in large populations of eukaryotic and prokaryotic genes. Hence, it is believed to be an ancestral structural property of genes. A new computational approach based on comparative genomics is developed to identify essential molecular functions associated with circular codes. It is based on a quantitative and sensitive statistical method (FPTF) to identify three permuted trinucleotide sets in the three frames of genes, a flower automaton algorithm to determine if a trinucleotide set is a circular code or not, and an integrated Gene Ontology and Taxonomy (iGOT) database. By carrying out automatic circular code analyses on a huge number of gene populations where each population is associated with a particular molecular function, it identifies 266 gene populations having circular codes close to X. Surprisingly, their molecular functions include 98% of those covered by the essential genes of the DEG database (Database of Essential Genes). Furthermore, three trinucleotides GTG, AAG and GCG, replacing three trinucleotides of the code X and called “evolutionary” trinucleotides, significantly occur in these 266 gene populations. Finally, a new method developed to analyse and quantify the stability of a set of trinucleotides demonstrates that these evolutionary trinucleotides are associated with a significant increase of the stability of the common circular code X. Indeed, its stability increases from the 1502th rank to the 16th rank after the replacement of the three evolutionary trinucleotides among 9920 possible trinucleotide replacement sets.  相似文献   

2.
We develop here a new class of stochastic models of gene evolution in which a random subset of the 64 possible trinucleotides mutates at each evolutionary time t according to some time dependent substitution probabilities. Therefore, at each time t, the numbers and the types of mutable trinucleotides are unknown. Thus, the mutation matrix changes at each time t. This pseudochaotic model developed generalizes the standard model in which all the trinucleotides mutate at each time t. It determines the occurrence probabilities at time t of trinucleotides which pseudochaotically mutate according to 3 time dependent substitution parameters associated with the 3 trinucleotide sites. The main result proves that under suitable assumptions, this pseudochaotic model converges to a uniform probability vector identical to that of the standard model. Furthermore, an application of this pseudochaotic model allows an evolutionary study of the 3 circular codes identified in both eukaryotic and prokaryotic genes. A circular code is a particular set of trinucleotides whose main property is the retrieval of the frames in genes locally, i.e., anywhere in genes and particularly without start codons, and automatically with a window of a few nucleotides. After a certain evolutionary time and with particular time dependent functions for the 3 substitution parameters, precisely an exponential decrease in the 1st and 2nd trinucleotide sites and an exponential increase in the 3rd one, this pseudochaotic model retrieves the main statistical properties of the 3 circular codes observed in genes. Furthermore, it leads to a circular code asymmetry stronger than the standard model (nonpseudochaotic) and, therefore, to a better correlation with the genes.  相似文献   

3.
A circular code has been identified in the protein (coding) genes of both eukaryotes and prokaryotes by using a statistical method called trinucleotide frequency (TF) method [Arquès & Michel (1996). J. theor. Biol. 182, 45-58]. Recently, a probabilistic model based on the nucleotide frequencies with a hypothesis of absence of correlation between successive bases on a DNA strand, has been proposed by Koch & Lehmann [(1997). J. theor. Biol. 189, 171-174] for constructing some particular circular codes. Their interesting method which we call here nucleotide frequency (NF) method, reveals several limits for constructing the circular code observed with protein genes.  相似文献   

4.
We develop here a new class of stochastic models of gene evolution in which the mutations are chaotic, i.e. a random subset of the 64 possible trinucleotides mutates at each evolutionary time t according to some substitution probabilities. Therefore, at each time t, the numbers and the types of mutable trinucleotides are unknown. Thus, the mutation matrix changes at each time t. The chaotic model developed generalizes the standard model in which all the trinucleotides mutate at each time t. It determines the occurrence probabilities at time t of trinucleotides which chaotically mutate according to three substitution parameters associated with the three trinucleotide sites. Two theorems prove that this chaotic model has a probability vector at each time t and that it converges to a uniform probability vector identical to that of the standard model. Furthermore, four applications of this chaotic model (with a uniform random strategy for the 64 trinucleotides and with a particular strategy for the three stop codons) allow an evolutionary study of the three circular codes identified in both eukaryotic and prokaryotic genes. A circular code is a particular set of trinucleotides whose main property is the retrieval of the frames in genes locally, i.e. anywhere in genes and particularly without start codons, and automatically with a window of a few nucleotides. After a certain evolutionary time and with particular values for the three substitution parameters, the chaotic models retrieve the main statistical properties of the three circular codes observed in genes. These applications also allow an evolutionary comparison between the standard and chaotic models.  相似文献   

5.
We develop here a new class of gene evolution models in which the nucleotide mutations are time dependent. These models allow to study nonlinear gene evolution by accelerating or decelerating the mutation rates at different evolutionary times. They generalize the previous ones which are based on constant mutation rates. The stochastic model developed in this class determines at some time t the occurrence probabilities of trinucleotides mutating according to 3 time dependent substitution parameters associated with the 3 trinucleotide sites. Therefore, it allows to simulate the evolution of the circular code recently observed in genes. By varying the class of function for the substitution parameters, 1 among 12 models retrieves after mutation the statistical properties of the observed circular code in the 3 frames of actual genes. In this model, the mutation rate in the 3rd trinucleotide site increases during gene evolution while the mutation rates in the 1st and 2nd sites decrease. This property agrees with the actual degeneracy of the genetic code. This approach can easily be generalized to study evolution of motifs of various lengths, e.g., dicodons, etc., with time dependent mutations.  相似文献   

6.
Comma-free codes constitute a class of circular codes, which has been widely studied, in particular by Golomb et al. (Biologiske Meddelelser, Kongelige Danske Videnskabernes Selskab 23:1–34, 1958a, Can J Math 10:202–209, 1958b), Michel et al. (Comput Math Appl 55:989–996, 2008a, Theor Comput Sci 401:17–26, 2008b, Inf Comput 212:55–63, 2012), Michel and Pirillo (Int J Comb 2011:659567, 2011), and Fimmel and Strüngmann (J Theor Biol 389:206–213, 2016). Based on a recent approach using graph theory to study circular codes Fimmel et al. (Philos Trans R Soc 374:20150058, 2016), a new class of circular codes, called strong comma-free codes, is identified. These codes detect a frameshift during the translation process immediately after a reading window of at most two nucleotides. We describe several combinatorial properties of strong comma-free codes: enumeration, maximality, self-complementarity and \(CF^3\)-property (comma-free property in all the three possible frames). These combinatorial results also highlight some new properties of the genetic code and its evolution. Each amino acid in the standard genetic code is coded by at least one strong comma-free code of size 1. There are 9 amino acids \(S=\{Asn,Asp,Gln,Gly,Lys,Met,Phe,Pro,Trp\}\) among 20 such that for each amino acid from S, its synonymous trinucleotide set (excluding the necessary periodic trinucleotides \(\{AAA,CCC,GGG,TTT\}\)) is a strong comma-free code. The primeval comma-free RNY code of Eigen and Schuster (Naturwissenschaften 65:341–369, 1978) is a self-complementary \(CF^3\)-code of size 16. Furthermore, it is the union of two strong comma-free codes of size 8 which are complementary to each other.  相似文献   

7.
The mouse mitochondrial DNA origin of light-strand replication has been defined as a 32-nucleotide region located among five transfer RNA genes in the genomic sequence. A distinctive feature of this origin is its potential to form a perfectly complementary stem and 11-nucleotide loop structure. Previous studies have demonstrated that the 5′ ends of nascent light strands map within this region and a major trinucleotide ribosubstitution site in closed circular mouse mitochondrial DNA has been mapped within the stem sequence.Direct analysis and precise localization of the 5′ ends of nascent light strands indicate that essentially all 5′ ends are ribonucleotides mapping in the originspecific dyadic structure. The major 5′ end identified is the rG at position 5187 in the genomic sequence. Priming of replication most likely occurs within the loop portion of the potential dyad and continues for 2 to 16 nucleotides with a sharply defined switch to deoxyribonucleotide synthesis. This functional transition point is identical in map position to the trinucleotide ribosubstitution site in mature, closed circular mitochondrial DNA.  相似文献   

8.
The subset X0=[AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG, GAA,GAC,GAG,GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC] of 20 trinucleotides has a preferential occurrence in the frame 0 (reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X0 is a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in the frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). X0 is called a C3 code (Arquès and Michel, 1997, J. Biosyst 44, 107-134). A quantitative study of these three subsets X0, X1 and X2 in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of X0, X1 and X2 in the frame 0 of eukaryotic protein genes are 48.5%, 29% and 22.5% respectively. These properties are not observed in the 5' and 3' regions of eukaryotes where X0, X1 and X2 occur with variable frequencies around the random value (1/3). Several frequency asymmetries unexpectedly observed, e.g. the frequency difference between X1 and X2 in the frame 0, are related to a new property of the C3 code X0 involving substitutions. An evolutionary analytical model at three parameters (p, q, t) based on an independent mixing of the 20 codons (trinucleotides in the frame 0) of X0 with equiprobability (1/20) followed by t approximately 4 substitutions per codon according to the proportions p approximately 0.1, q approximately 0.1 and r = 1 - p - q approximately 0.8 in the three codon sites respectively, retrieves the frequencies of X0, X1 and X2 observed in the three frames of protein genes and explains these asymmetries. The complex behaviour of these analytical curves is totally unexpected and a priori difficult to imagine. Finally, the evolutionary analytical method developed could be applied to the phylogenetic tree reconstruction and the DNA sequence alignment.  相似文献   

9.
In several papers Arquès and Michel studied the maximal circular codes consisting of words of length 3 (or trinucleotides) on the genetic alphabet {A, C, G, T}. We present here some additional information on these codes. In particular, we study the growth function of the self-complementary circular codes and we prove that among them exactly 528 are maximal.  相似文献   

10.
In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C3 maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization.  相似文献   

11.
Self-complementary circular codes are involved in pairing genetic processes. A maximal \(C^3\) self-complementary circular code X of trinucleotides was identified in genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel in Life 7(20):1–16 2017, J Theor Biol 380:156–177, 2015; Arquès and Michel in J Theor Biol 182:45–58 1996). In this paper, self-complementary circular codes are investigated using the graph theory approach recently formulated in Fimmel et al. (Philos Trans R Soc A 374:20150058, 2016). A directed graph \(\mathcal {G}(X)\) associated with any code X mirrors the properties of the code. In the present paper, we demonstrate a necessary condition for the self-complementarity of an arbitrary code X in terms of the graph theory. The same condition has been proven to be sufficient for codes which are circular and of large size \(\mid X \mid \ge 18\) trinucleotides, in particular for maximal circular codes (\(\mid X \mid = 20\) trinucleotides). For codes of small-size \(\mid X \mid \le 16\) trinucleotides, some very rare counterexamples have been constructed. Furthermore, the length and the structure of the longest paths in the graphs associated with the self-complementary circular codes are investigated. It has been proven that the longest paths in such graphs determine the reading frame for the self-complementary circular codes. By applying this result, the reading frame in any arbitrary sequence of trinucleotides is retrieved after at most 15 nucleotides, i.e., 5 consecutive trinucleotides, from the circular code X identified in genes. Thus, an X motif of a length of at least 15 nucleotides in an arbitrary sequence of trinucleotides (not necessarily all of them belonging to X) uniquely defines the reading (correct) frame, an important criterion for analyzing the X motifs in genes in the future.  相似文献   

12.
Gaĭtskoki VS  Patkin EL 《Genetika》2000,36(7):869-886
The new data on the mechanisms underlying trinucleotide repeat expansion are reviewed with special reference to the role of chromatin structure, DNA replication, methylation, and amplification in repeat expansion during ontogeny. A hypothesis is advanced as to the crucial role of processes that occur at the preimplantation developmental stage, such as sister chromatid exchanges, single-strand DNA breaks, and demethylation. The molecular and cellular events responsible for association between trinucleotide expansion and various diseases are discussed.  相似文献   

13.
14.
We have developed the first set of trinucleotide and tetranucleotide markers for the Japanese flounder, Paralichthys olivaceus. One hundred and sixty-seven polymorphic trinucleotide and tetranucleotide microsatellites were isolated using clones derived from two libraries. Of almost 200,000 clones analysed, 0.5% presented trinucleotide or tetranucleotide repeat regions. Among the trinucleotide repeats analysed in this study, the most frequent one was (CAG)(n) and the most common tetranucleotide repeat was (GATA)(n). The position of the new markers in the genetic linkage map was determined. Markers were evenly distributed along the P. olivaceus linkage groups, without distinction between the kinds of repeats and library of origin. The markers isolated in this study contribute significantly to the genetic linkage map of the Japanese flounder.  相似文献   

15.
Inoue S  Takahashi K  Ohta M 《Genomics》1999,57(1):169-172
A method was developed for effective isolation of trinucleotide repeats from genomic DNA. This method is based on the DNA polymerase reaction, which is restricted with only two or three nucleotide substrates and primed by biotinylated oligonucleotide probes. Sequences are then isolated by a streptavidin biotin-trapping method. More than 80% of the clones from each library contained more than eight trinucleotide repeats. Sequence analysis showed that the characteristic dinucleotide flanking sequences usually confronting various trinucleotide repeats are not found in the vicinity of CAG repeats, suggesting that CAG repeats may have been generated through a mechanism different from that of other trinucleotide repeats.  相似文献   

16.
A new method to produce a set of 20 high quality trinucleotide phosphoramidites on a 5-10 g scale each was developed. The procedure starts with condensation reactions of P-components with N-acyl nucleosides, bearing the 3 '-hydroxyl function protected with 2-azidomethylbenzoyl, to give fully protected dinucleoside phosphates 13. Upon cleavage of dimethoxytrityl group from 13, dinucleoside phosphates 16 are initially transformed into trinucleoside diphosphates 19 and then the 2-azidomethylbenzoyl is selectively removed under neutral conditions to generate trinucleoside diphosphates 5 in excellent yield. Subsequent 3 '-phosphitylation affords target trinucleotide phosphoramidites 7. When mutagenic oligonucleotides are synthesized employing mixtures of building blocks 7 as well as following the new synthetic protocol, representative oligonucleotide libraries are generated in good yields.  相似文献   

17.
The ataxias are a complex group of diseases with both environmental and genetic causes. Among the autosomal dominant forms of ataxia the genes for two, spinocerebellar ataxia type 1 (SCA1) and Machado-Joseph disease (MJD), have been isolated. In both of these disorders the molecular basis of disease is the expansion of an unstable CAG trinucleotide repeat. To assess the frequency of the SCA1 and MJD trinucleotide repeat expansions among individuals diagnosed with ataxia we have collected DNA from individuals representing 311 families with adult-onset ataxia of unknown etiology and screened these samples for trinucleotide repeat expansions within the SCA1 and MJD genes. Within this group there are 149 families with dominantly inherited ataxia. Of these, 3% had SCA1 trinucleotide repeat expansions, whereas 21% were positive for the MJD trinucleotide expansion. Thus, together SCA1 and MJD represent 24% of the autosomal dominant ataxias in our group, and the frequency of MJD is substantially greater than that of SCA1. For the 57 patients with MJD trinucleotide repeat expansions, a strong inverse correlation between CAG repeat size and age at onset was observed (r = -.838). Among the MJD patients, the normal and affected ranges of CAG repeat size are 14-40 and 68-82 repeats, respectively. For SCA1 the normal and affected ranges are much closer, containing 19-38 and 40-81 CAG repeats, respectively.  相似文献   

18.
Three sets of 20 trinucleotides are preferentially associated with the reading frames and their 2 shifted frames of both eukaryotic and prokaryotic genes. These 3 sets are circular codes. They allow retrieval of any frame in genes (containing these circular code words), locally anywhere in the 3 frames and in particular without start codons in the reading frame, and automatically with the reading of a few nucleotides. The circular code in the reading frame, noted X, which can deduce the 2 other circular codes in the shifted frames by permutation, is the information used for analysing frameshift genes, i. e. genes with a change of reading frame during translation. This work studies the circular code signal around their frameshift sites. Two scoring methods are developed, a function P based on this code X and a function Q based both on this code X and the 4 trinucleotides with identical nucleotides. They detect a significant correlation between the code X and the -1 frameshift signals in both eukaryotic and prokaryotic genes, and the +1 frameshift signals in eukaryotic genes.  相似文献   

19.
The trinucleotide repeats that expand to cause human disease form hairpin structures in vitro that are proposed to be the major source of their genetic instability in vivo. If a replication fork is a train speeding along a track of double-stranded DNA, the trinucleotide repeats are a hairpin curve in the track. Experiments have demonstrated that the train can become derailed at the hairpin curve, resulting in significant damage to the track. Repair of the track often results in contractions and expansions of track length. In this review we introduce the in vitro evidence for why CTG/CAG and CCG/CGG repeats are inherently unstable and discuss how experiments in model organisms have implicated the replication, recombination and repair machinery as contributors to trinucleotide repeat instability in vivo.  相似文献   

20.
In fungi, microsatellites occur less frequently throughout the genome and tend to be less polymorphic compared with other organisms. Most studies that develop microsatellites for fungi focus on dinucleotide and trinucleotide repeats, and thus mononucleotide repeats, which are much more abundant in fungal genomes, may represent an overlooked resource. This study examined the relative probabilities of polymorphism in mononucleotide, dinucleotide and trinucleotide repeats in Aspergillus nidulans. As previously found, the probability of polymorphism increased with increasing number of repeating units. Dinucleotide and trinucleotide repeats had higher probabilities of polymorphism than mononucleotide repeats, but this was offset by the presence of numerous long mononucleotide repeats within the genome. Mononucleotide microsatellites with 20 or more repeating units have a probability of polymorphism similar to dinucleotide and trinucleotide microsatellites, and therefore, consideration of mononucleotide repeats will substantially increase the number of potential markers available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号