首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.  相似文献   

2.
Summary Coding sequences of eucaryotic nuclear DNA were characterized by an excess of short runs and a deficit of long runs of weak and of strong hydrogen bonding bases; non-coding sequences by a deficit of short runs and an excess of long runs, in the same of purines and of pyrimidines. The conservation of these attributes across DNA sequences coding for proteins of widely different function, across widely different eucaryotic species for the same protein and across related genes that diverged a long time ago and that now show large differences in base and, if coding, amino acid sequence suggested that these attributes have survival value. It was concluded that these attributes constitute probalistic constraints on th primary structure (base sequence) of both coding and non-coding DNA.  相似文献   

3.
We assess the similarity of base substitution processes, described by empirically derived 4 × 4 matrices, using chi-square homogeneity tests. Such significance analyses allow us to assess variation in sequence evolution across sites and we apply them to matrices derived from noncoding sites in different contexts in grass chloroplast DNA. We show that there is statistically significant variation in rates and patterns of mutation among noncoding sites in different contexts and then demonstrate a similar and significant influence of context on substitutions at fourfold degenerate sites of coding regions from grass chloroplast DNA. These results show that context has the same general effect on substitution bias in coding and noncoding DNA: the A+T content of flanking bases is correlated with rate of substitution, transition bias, and GC → AT pressure, while the number of flanking pyrimidines on a single strand is correlated with a mutational bias, or skew, toward pyrimidines. Despite the similarity in general trends, however, when we compare coding and noncoding matrices we find that there is a statistically significant difference between them even when we control for context. Most noticeably, fourfold degenerate sites in coding sequences are undergoing substitution at a higher rate and there are also significant differences in the relationship between pyrimidines skew and the number of flanking pyrimidines. Possible reasons for the differences between coding and noncoding sites are discussed. Furthermore, our analysis illustrates a simple statistical way for comparing substitution processes across sites allowing us to better study variation in evolutionary processes across a genome. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

4.
By considering three DNA sequences simultaneously there is sufficient information to recover a full Markov model with three transition matrices from the root to each of the sequences. It is necessary to have relatively long sequences because, for nucleotides, the full model requires 39 parameters that are estimated from 63 observable values. This triplet Markov method is evaluated for the protein coding genes of mammalian vertebrate mitochondrial genomes, and, in addition, version for two-state-characters (such as R/Y coding) is implemented. A key finding is that some changes in mutational mechanism differentially affect the mutation rate between pairs of nucleotides: there does not appear to be a universal change in "rate" of evolution. It remains to be explored whether detecting changes in certain nucleotide interchanges can be localized to a particular part of the DNA replication/repair system. In order to estimate divergence dates it may eventually be advantageous to use the nucleotide interchanges that show little rate change.  相似文献   

5.
Nucleoids, a subnuclear system capable of chain elongation   总被引:1,自引:0,他引:1  
Nucleoids, prepared by salt extraction of non-DNase-digested nuclei, have properties similar, but not identical, to those of nuclear matrices which are prepared by salt extraction of DNase-digested nuclei. Nuclear matrices retained less pulse-labelled DNA, slightly less bound DNA polymerase alpha and DNA primase, but had greater in vitro DNA synthesis and in vitro priming. Nucleoids contained larger (110 S) DNA chains than nuclear matrices (30 S). Each type of residual nuclear structure could synthesize 4.5 S Okazaki fragments. When extracted with increasing concentrations of salt, DNase-digested nucleo lost the ability for further elongation of the 4.5 S DNA intermediate after 0.1-0.2 M NaCl, whereas undigested nuclei retained this ability up to 0.9 M NaCl. Chain elongation to 28 S DNA chains could be restored to nucleoids, but not to nuclear matrices, by the addition of nuclear extracts.  相似文献   

6.
Length Mutations in Human Mitochondrial DNA   总被引:42,自引:8,他引:42  
R. L. Cann  A. C. Wilson 《Genetics》1983,104(4):699-711
By high-resolution, restriction mapping of mitochondrial DNAs purified from 112 human individuals, we have identified 14 length variants caused by small additions and deletions (from about 6 to 14 base pairs in length). Three of the 14 length differences are due to mutations at two locations within the D loop, whereas the remaining 11 occur at seven sites that are probably within other noncoding sequences and at junctions between coding sequences. In five of the nine regions of length polymorphism, there is a sequence of five cytosines in a row, this sequence being comparatively rare in coding DNA. Phylogenetic analysis indicates that, in most of the polymorphic regions, a given length mutation has arisen several times independently in different human lineages. The average rate at which length mutations have been arising and surviving in the human species is estimated to be many times higher for noncoding mtDNA than for noncoding nuclear DNA. The mystery of why vertebrate mtDNA is more prone than nuclear DNA to evolve by point mutation is now compounded by the discovery of a similar bias toward rapid evolution by length mutation.  相似文献   

7.
An improved quantitative model describing a protective function of eukaryotic genomic noncoding sequences was developed. In this new model, two factors affecting gene protection from chemical mutagensare considered: (1) the ratio of the total lengths of coding and noncoding genomic sequences and (2) the volume of the cell nucleus. An increase in the noncoding DNA in the genome reduces the number of mutagen-damaged nucleotides in the coding region, whereas an increase in the volume of the nucleus decreases the flow of mutagens per unit of nuclear volume that attacks its surface.  相似文献   

8.
An improved quantitative model describing a protective function of eukaryotic genomic noncoding sequences was developed. In this new model, two factors affecting gene protection from chemical mutagens are considered: (1) the ratio of the total lengths of coding and noncoding genomic sequences and (2) the volume of the cell nucleus. An increase in the noncoding DNA in the genome reduces the number of mutagen-damaged nucleotides in the coding region, whereas an increase in the volume of the nucleus decreases the flow of mutagens per unit of nuclear volume that attacks its surface.  相似文献   

9.
An exact expression for the variance of random frequency thata given word has in text generated by a Markov chain is presented.The result is applied to periodic Markov chains, which describethe protein-coding DNA sequences better than simple Markov chains.A new solution to the problem of word overlap is proposed. Itwas found that the expected frequency and overlapping propertiesdetermine most of the variance. The expectation and varianceof counts for triplets are compared with experimental countsin Escherichia coli coding sequences.  相似文献   

10.
We are describing a system for the introduction, selection, and expression of eucaryotic genes in higher eucaryotic cells. The carrier consisted of the herpes simplex virus 1 (HSV-1) tk gene covalently linked to an HSV-1 alpha promoter directed away from the tk gene. In this study we fused to the alpha promoter the 5' transcribed noncoding sequences and the coding sequences of the chicken oviduct ovalbumin gene. Cells converted to the TK+ phenotype with this chimeric fragment produced an ovalbumin precursor which was processed and secreted into the extracellular fluid. The ovalbumin gene utilized the HSV-1 alpha promoter and was regulated as a viral gene inasmuch as inversion of the genomic DNA relative to the alpha promoter resulted in no ovalbumin synthesis, and production of ovalbumin was enhanced after superinfection with HSV-1. Synthesis of ovalbumin was not detected when cDNA was linked to the HSV-1 alpha promoter. The carrier system described in this study is suitable for introduction, selection, and expression of eucaryotic genes whose natural promoter is either weak or requires the presence of regulatory elements which may be absent from undifferentiated cells in culture.  相似文献   

11.
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.  相似文献   

12.
13.
Recently, the application of two statistical methods (related to Zipf's distribution and Shannon's redundancy), called 'linguistic' tests, to the primary structure of DNA sequences of living organisms has excited considerable interest. Of particular importance is the claim that noncoding DNA sequences in eukaryotes display specific 'linguistic' features, being reminiscent of natural languages. Furthermore, this implies that noncoding regions of DNA may carry some new, thus far unknown, biological information which is revealed by these tests. In this paper these claims are tested quantitatively. With the aid of computer simulations of natural DNA sequences, and by applying the same 'linguistic' tests to both natural and artificial sequences, we investigate in detail the reasons of the appearance of the claimed 'linguistic' features and the associated differences between coding and noncoding DNAs. The presented results show quantitatively that the 'linguistic' tests failed to reveal any new biological information in (noncoding or coding) DNA.  相似文献   

14.
The chemical structure of DNA is characterized by sequences of four basic nitrogens occurring in one of two nucleic acid chains and in a complementary fashion in the other. Markov chain is the aspect of probability theory that analyzes discrete states in which transition is a fixed probability not affected by the history of the system. It is shown that DNA is represented in the form of regular Markov chain. Ergodicity property and law of large numbers follow from the statistical analysis of stationary transition probabilities.  相似文献   

15.
Prevalence of quadruplexes in the human genome   总被引:28,自引:17,他引:11  
Guanine-rich DNA sequences of a particular form have the ability to fold into four-stranded structures called G-quadruplexes. In this paper, we present a working rule to predict which primary sequences can form this structure, and describe a search algorithm to identify such sequences in genomic DNA. We count the number of quadruplexes found in the human genome and compare that with the figure predicted by modelling DNA as a Bernoulli stream or as a Markov chain, using windows of various sizes. We demonstrate that the distribution of loop lengths is significantly different from what would be expected in a random case, providing an indication of the number of potentially relevant quadruplex-forming sequences. In particular, we show that there is a significant repression of quadruplexes in the coding strand of exonic regions, which suggests that quadruplex-forming patterns are disfavoured in sequences that will form RNA.  相似文献   

16.
The nuclear genome of eukaryotes contains large amounts of cytoplasmic organelle DNA (nuclear integrants of organelle DNA [norgs]). The recent sequencing of many mitochondrial and chloroplast genomes has enabled investigation of the potential role of norgs in endosymbiotic evolution. In this article, we describe a new polymerase chain reaction-based method that allows the identification and evolutionary study of recent and older norgs in a range of eukaryotes. We tested this method in the genus Nicotiana and obtained sequences from seven nuclear integrants of plastid DNA (nupts) totaling 25 kb in length. These nupts were estimated to have been transferred 0.033 to 5.81 million years ago. The spectrum of mutations present in the potential protein-coding sequences compared with the noncoding sequences of each nupt revealed that nupts evolve in a nuclear-specific manner and are under neutral evolution. Indels were more frequent in noncoding regions than in potential coding sequences of former chloroplastic DNA, most probably due to the presence of a higher number of homopolymeric sequences. Unexpectedly, some potential protein-coding sequences within the nupts still contained intact open reading frames for up to 5.81 million years. These results suggest that chloroplast genes transferred to the nucleus have in some cases several millions of years to acquire nuclear regulatory elements and become functional. The different factors influencing this time frame and the potential role of nupts in endosymbiotic gene transfer are discussed.  相似文献   

17.
Five independent clones containing the natural chicken ovomucoid gene have been isolated from a chicken gene library. One of these clones, CL21, contains the complete ovomucoid gene and includes more than 3 kb of DNA sequences flanking both termini of the gene. Restriction endonuclease mapping, electron microscopy and direct DNA sequencing analyses of this clone have revealed that the ovomucoid gene is 5.6 kb long and codes for a messenger RNA of 821 nucleotides. The structural gene sequence coding Ifor the mature messenger RNA is split into at least eight segments by a minimum of seven intervening sequences of various sizes. The shortest structural gene segment is only 20 nucleotides long. All seven intervening sequences are located within the peptide coding region of the gene, and the sequences at the 5' and 3' untranslated regions of the mRNA are not interrupted by intervening sequences. The DNA sequences of the regions flanking the 5' and 3' termini of the gene have been determined. Thirty nucleotides before the start of the messenger RNA coding sequence is the heptanucleotide TATATAT, which is also present in a similar location relative to the chicken ovalbumin gene and other unique sequence eucaryotic genes. This sequence resembles that of the Pribnow box in procaryotic genes where a promoter function has been implicated. Seven nucleotides past the 3' end of the gene is the tetranucleotide TTGT, a sequence found to be present at identical locations as either TTTT or TTGT in other eucaryotic genes that have been sequenced. These conserved DNA sequences flanking eucaryotic genes may serve some regulator function in the expression of these genes.  相似文献   

18.
The DNA coding sequence for the hygromycin B phosphotransferase gene was placed under the control of the regulatory sequences of a cloned long terminal repeat of Moloney sarcoma virus. This construction allowed direct selection for hygromycin B resistance after transfection of eucaryotic cell lines not naturally resistant to this antibiotic, thus providing another dominant marker for DNA transfer in eucaryotic cells.  相似文献   

19.
The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.  相似文献   

20.
Polymerase chain reaction (PCR) amplification was employed to construct a mosaic gene consisting of the propeptide region of protein S and the glutamic acid-rich domain of osteonectin. The strategy is straightforward, results in large amounts of material, and is universally applicable for the generation of protein domain chimeras. In some cases 10% dimethyl sulfoxide aided the amplification. Four base CCGC "clamp" sequences adjacent to BamHI restriction sites at the ends of the PCR products were used to enhance the ligation of products. A hybrid inverse complement oligonucleotide primer composed of sequences containing 20 nucleotides of protein S and 16 nucleotides of osteonectin was used in the first round of PCR. An additional osteonectin sequence was added to the initial amplified product by performing PCR using a second "boot-strap" primer containing 18 nucleotides of osteonectin. Primers used to amplify osteonectin encompassed the 146-aminoacid NH2-terminal half of osteonectin. The double-stranded first-round fragments of protein S-osteonectin and osteonectin were subsequently mixed together and one elongation cycle of PCR was performed. Annealing occurred as the result of the 34-base-pair overlap region composed of osteonectin sequence. Taq polymerase was used for elongation with subsequent recombinant DNA synthesis. After elongation, external primers were added to amplify the protein S-osteonectin gene construct. The protocol we have developed allows noncoding and coding segments of DNA to be linked, GC-rich areas of DNA to be amplified, hybridization temperatures to be increased, annealing times to be reduced, and PCR of products to be subcloned.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号