首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A statistical analysis of occurrence of particular nucleotide runs (1 divided by 10 nucleotides long) in DNA sequences of different species has been carried out. There are considerable differences in run distributions in DNA sequences of prokaryotes, invertebrates and vertebrates. Distribution of various types of runs has been found to be different in coding and non-coding sequences. There is an abundance of short runs 1 divided by 2 nucleotides long in coding sequences, and there is a deficiency of such runs in the non-coding regions. However, some interesting exceptions from this rule exist: for run distribution of adenine in prokaryotes and for distribution of purine-pyrimidine runs in eukaryotes. This may be stipulated by the fact that the distribution of runs are predetermined by structural peculiarities of the entire DNA molecule. Runs of guanine or cytosine of three to six nucleotides long occur predominantly in the non-coding DNA regions in eukaryotes, especially in vertebrates.  相似文献   

2.
We present data on the frequencies of nucleotides and nucleotide substitutions in conservative DNA regions involved in the regulation of gene expression. Data on prokaryotes and eukaryotes are considered separately. In both cases DNA strands complementary to those which serve as templates for RNA-polymerase have low frequencies of cytosine. The most conservative positions also have an increased frequency of adenine. Various substitutions in the series of homologous regulatory DNA sequences, as compared to their consensuses, have different frequencies. In prokaryotes guanine in a consensus sequence is substituted for at the lowest and adenine at the highest frequency, whereas in eukaryotes cytosine is substituted for at the lowest and guanine at the highest frequency. In both cases the nucleotides substituted for are most frequently replaced with cytosine. Deviations from consensus sequences tend to cluster in adjacent positions. The more pronounced the consequences of a nucleotide substitution are the higher is the frequency of substitutions in adjacent positions. Possible explanations for these phenomena are discussed.  相似文献   

3.
Summary Analysis of the sequence data available today, comprising more than 500,000 bases, confirms the previously observed phenomenon that there are distinct dinucleotide preferences in DNA sequences. Consistent behaviour is observed in the major sequence groups analysed here in prokaryotes, eukaryotes and mitochondria. Some doublet preferences are common to all groups and are found in most sequences of the Los Alamos Library. The patterns seen in such large data sets are very significant statistically and biologically. Since they are present in numerous and diverse nucleotide sequences, one may conclude that they confer evolutionary advantages on the organism.In eukaryotes RR and YY dinucleotides are preferred over YR and RY (where R is a purine and Y a pyrimidine). Since opposite-chain nearest-neighbour purine clashes are major determinants of DNA structure, it appears that the tight packaging of DNA in nucleosomes disfavors, in general, such (YR and RY) steric repulsion.  相似文献   

4.
5.
Wrinkled DNA.   总被引:15,自引:9,他引:6       下载免费PDF全文
The B form of poly d(GC):poly d(GC) in orthorhombic microcrystallites in oriented fibers has a secondary structure in which a dinucleotide is the repeated motif rather than a mononucleotide as in standard, smooth B DNA. One set of nucleotides (probably GpC) has the same conformations as the smooth form but the alternate (CpG) nucleotides have a different conformation at C3'-O3'. This leads to a distinctive change in the orientation of the phosphate groups. Similar perturbations can be detected in other poly d(PuPy):poly d(PuPy) DNAs such as poly d(IC):poly d(IC) and poly d(AT):poly d(AT) in their D forms which have tetragonal crystal environments. This suggests that such perturbations are intrinsic to all stretches of duplex DNA where purines and pyrimidines alternate and may play a role in the detection and exploitation of such sequences by regulatory proteins.  相似文献   

6.
We have used computer-assisted methods to search large amounts of the human, yeast and Escherichia coli genomes for inverted repeat (IR) and mirror repeat (MR) DNA sequence patterns. In highly supercoiled DNA some IRs can form cruciforms, while some MRs can form intramolecular triplexes, or H-DNA. We find that total IR and MR sequences are highly enriched in both eukaryotic genomes. In E. coli, however, only total IRs are enriched, while total MRs only occur as frequently as in random sequence DNA. We then used a set of experimentally derived criteria to predict which of the total IRs and MRs are most likely to form cruciforms or H-DNA in supercoiled DNA. We show that strong cruciform forming sequences occur at a relatively high frequency in yeast (1/19 700 bp) and humans (1/41 800 bp), but that H-DNA forming sequences are abundant only in humans (1/49 400 bp). Strong cruciform and H-DNA forming sequences are not abundant in the E.coli genome. These results suggest that cruciforms and H-DNA may have a functional role in eukaryotes, but probably not prokaryotes.  相似文献   

7.
8.
Species-specific patterns of DNA bending and sequence.   总被引:16,自引:6,他引:10       下载免费PDF全文
Nucleotide sequences in the GenEMBL database were analyzed using strategies designed to reveal species-specific patterns of DNA bending and DNA sequence. The results uncovered striking species-dependent patterns of bending with more variations among individual organisms than between prokaryotes and eukaryotes. The frequency of bent sites in sequences from different bacteria was related to genomic A + T content and this relationship was confirmed by electrophoretic analysis of genomic DNA. However, base composition was not an accurate predictor for DNA bending in eukaryotes. Sequences from C. elegans exhibited the highest frequency of bent sites in the database and the RNA polymerase II locus from the nematode was the most bent gene in GenEMBL. Bent DNA extended throughout most introns and gene flanking segments from C.elegans while exon regions lacked A-tract bending characteristics. Independent evidence for the strong bending character of this genome was provided by electrophoretic studies which revealed that a large number of the fragments from C.elegans DNA exhibited anomalous gel mobilities when compared to genomic fragments from over 20 other organisms. The prevalence of bent sites in this genome enabled us to detect selectively C.elegans sequences in a computer search of the database using as probes C.elegans introns, bending elements, and a 20 nucleotide consensus sequence for bent DNA. This approach was also used to provide additional examples of species-specific sequence patterns in eukaryotes where it was shown that (A) greater than or equal to 10 and (A.T) greater than or equal to 5 tracts are prevalent throughout the untranslated DNA of D.discodium and P.falciparum, respectively. These results provide new insight into the organization of eukaryotic DNA because they show that species-specific patterns of simple sequences are found in introns and in other untranslated regions of the genome.  相似文献   

9.
The nucleotide sequence running from the genetic left end of bacteriophage T7 DNA to within the coding sequence of gene 4 is given, except for the internal coding sequence for the gene 1 protein, which has been determined elsewhere. The sequence presented contains nucleotides 1 to 3342 and 5654 to 12,100 of the approximately 40,000 base-pairs of T7 DNA. This sequence includes: the three strong early promoters and the termination site for Escherichia coli RNA polymerase: eight promoter sites for T7 RNA polymerase; six RNAase III cleavage sites; the primary origin of replication of T7 DNA; the complete coding sequences for 13 previously known T7 proteins, including the anti-restriction protein, protein kinase, DNA ligase, the gene 2 inhibitor of E. coli RNA polymerase, single-strand DNA binding protein, the gene 3 endonuclease, and lysozyme (which is actually an N-acetylmuramyl-l-alanine amidase); the complete coding sequences for eight potential new T7-coded proteins; and two apparently independent initiation sites that produce overlapping polypeptide chains of gene 4 primase. More than 86% of the first 12,100 base-pairs of T7 DNA appear to be devoted to specifying amino acid sequences for T7 proteins, and the arrangement of coding sequences and other genetic elements is very efficient. There is little overlap between coding sequences for different proteins, but junctions between adjacent coding sequences are typically close, the termination codon for one protein often overlapping the initiation codon for the next. For almost half of the potential T7 proteins, the sequence in the messenger RNA that can interact with 16 S ribosomal RNA in initiation of protein synthesis is part of the coding sequence for the preceding protein. The longest non-coding region, about 900 base-pairs, is at the left end of the DNA. The right half of this region contains the strong early promoters for E. coli RNA polymerase and the first RNAase III cleavage site. The left end contains the terminal repetition (nucleotides 1 to 160), followed by a striking array of repeated sequences (nucleotides 175 to 340) that might have some role in packaging the DNA into phage particles, and an A · T-rich region (nucleotides 356 to 492) that contains a promoter for T7 RNA polymerase, and which might function as a replication origin.  相似文献   

10.
The distributions of the junction sequences of homooligomer tracts of various lengths have been examined in prokaryotic DNA sequences and compared with those of eukaryotes. The general trends in the nearest and next to nearest neighbors to the tracts are similar for both groups. In both prokaryotes and eukaryotes A/T runs are preferentially flanked on either the 5' or the 3' ends by A and/or T. G/C runs are preferentially flanked by G and/or C. There is discrimination against A/T runs flanked by G or C and G/C runs flanked by A or T. However, whereas the distribution of prokaryotic homooligomer tract junction sequences was quite homogeneous, large variations were observed in the 5-fold larger eukaryotic database, increasing in magnitude from tracts of length 2 to 3 to 4 base pairs long. Possible DNA conformational implications and in particular DNA curvature and packaging aspects of prokaryotes and eukaryotes are discussed.  相似文献   

11.
Computer programs for the assembly of DNA sequences.   总被引:26,自引:20,他引:6       下载免费PDF全文
A collection of user-interactive computer programs is described which aid in the assembly of DNA sequences. This is achieved by searching for the positions of overlapping common nucleotide sequences within the blocks of sequence obtained as primary data. Such overlapping segments are then melded into one continuous string of nucleotides. Strategies for determining the accuracy of the sequence being analyzed and reducing the error rate resulting from the manual manipulation of sequence data are discussed. Sequences mapping from 97.3 to 100% of the Ad2 virus genome were used to demonstrate the performance of these programs.  相似文献   

12.
Complexity charts can be used to map functional domains in DNA   总被引:4,自引:0,他引:4  
We measured local compositional complexity (LCC) of DNA sequences by calculating Shannon information content over mononucleotide frequencies. Eukaryotic DNA appeared to be "simpler" than bacterial DNA even at the level of short oligonucleotides. Moreover, different DNA functional domains displayed different compositional complexity in a systematic manner. In particular, the complexity of exon sequences was systematically higher than the complexity of corresponding introns. We therefore present examples of complexity charts (plots of complexity versus position in sequence) for pre-mRNA sequences from higher eukaryotes. By taking a window width of 100 nucleotides and a window step of 1 nucleotide, introns can be distinguished from exons in the majority of cases studied. Complexity charts of immunoglobulin variable regions allowed correct mapping of exons and introns in these sequences as well, a task that was impossible with commercial programs available to date.  相似文献   

13.
By means of restriction enzymes analysis and molecular hybridization, the distribution of repeated DNA families has been studied in the different DNA components into which the human genome can be fractionated by density gradient techniques. Three classes of DNA molecules have been analyzed: i) an homogeneous DNA component (satellite-like sequences; Q = 1.696 g/cm3, 3% of total DNA, AT repeated), ii) AT rich (Q = 1.698 g/cm3, 30% of total DNA, AT main-band) and GC rich (Q = 1.708 g/cm3, 6% of total DNA, GC main-band) DNA components. By this approach we have observed that Sau3A digestion of GC main-band gives rise to two bands of 75bp and 150bp, absent or under-represented in both AT rich DNA components. A preliminary characterization of these DNA fragments suggests that they contain one or more families of repeated sequences which fail to hybridize to EcoRI, HindIII and AluI families of repeats. In addition, we have observed that EcoRI sequences (alpha-RI DNA) are under-represented in GC main-band and show the same clustered organization in both AT rich DNA components.  相似文献   

14.
Studies done in prokaryotes and eukaryotes have indicated that DNA sequence divergence decreases the frequency of homologous recombination. To determine which step(s) of homologous recombination is sensitive to DNA sequence divergence in mammalian cells we have used an assay that does not rely on the recovery of functional products. The assay is based on the acquisition by homologous recombination of endogenous LINE-1 sequences by exogenous LINE-1 sequences. In parallel experiments, we introduced into mouse cells two gapped exogenous LINE-1 sequences, one from the mouse, L1Md-A2, and the other from the rat, L1Rn-3. Although L1Rn-3 is on average less than 85% homologous to the LINE-1 elements of the mouse, the frequency of homologous recombination with endogenous LINE-1 elements obtained with L1Rn-3 was the same as the one obtained with L1Md-A2 which is on average 95% homologous to the LINE-1 elements of the mouse. The endogenous LINE-1 sequences rescued by L1Rn-3 were 8-18% divergent from L1Rn-3 sequences, whereas those rescued by L1Md-A2 were 2-5% divergent from L1Md-A2 sequences. The gap which had been introduced into the exogenous LINE-1 sequences had been precisely repaired in 50% of the recombinants obtained with L1Md-A2. None of the L1Rn-3 recombinants showed precise gap repair.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

15.
16.
Multi-dimensional scaling is applied to our codon space data on the protein coding sequences of DNA from a wide variety of organisms in an attempt to find the smallest number of parameters which will accurately represent these sequences. I find that a three-dimensional representation is satisfactory. One of the three resulting co-ordinates separates eukaryotes and their associated viruses from prokaryotes and their associated phages, while an orthogonal co-ordinate separates those organisms capable of synthesizing proteins (eukaryotes and prokaryotes) from those not so capable (viruses and phages). Mitochondria show no relation in our plots to any of these groups.  相似文献   

17.
Doublet frequencies in evolutionary distinct groups.   总被引:15,自引:9,他引:6       下载免费PDF全文
We analyze the dinucleotide frequencies of occurrence and preferences separately within the vertebrates, nonvertebrates, DNA viruses, mitochondria, RNA viruses, bacteria and phage sequences. Over half a million nucleotides from more than 400 sequences were used in this study. Distinct patterns are observed. Some of the patterns are common to all sequences, some to either eukaryotes or prokaryotes and others to the subgroups within them. Doublets are the most basic ingredient of order in nucleotide sequences. We suggest that their preferences and the arrangement of nucleotides in the DNA in general is determined to a large extent by the conformational and packaging considerations of the double helix. Some principles of DNA conformation are viewed in light of our results.  相似文献   

18.
19.
20.
The DNA base sequence changes induced by mutagenesis with ultraviolet light have been determined in a gene on a chromosome of cultured Chinese hamster ovary (CHO) cells. The gene was the Escherichia coli gpt gene, of which a single copy was stably incorporated and expressed in the CHO cell genome. The cells were irradiated with ultraviolet light and gpt- colonies were selected by resistance to 6-thioguanine. The gpt gene was amplified from chromosomal DNA by use of the polymerase chain reaction (PCR), and the amplified DNA sequenced directly by the dideoxy method. Of the 58 sequenced mutants of independent origin 53 were base change mutations. Forty-one base substitutions were single base changes, ten had two adjacent (or tandem) base changes, and one had two base changes separated by a single base-pair. Only one mutant had a multiple base change mutation with two or more well separated base changes. In contrast much higher levels of such mutations were reported in ultraviolet mutagenesis of genes on a shuttle vector in primate cells. Two deletions of a single base-pair were observed and three deletions ranging from 6 to 37 base-pairs. The mutation spectrum in the gpt gene had similarities to the ultraviolet mutation spectra for several genes in prokaryotes, which suggests similarities in mutational mechanisms in prokaryotes and eukaryotes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号