首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
One of the fascinating properties of the DNA sequences of prokaryotic and eukaryotic chromosomes is that they possess long-range order. Computational methods like spectral analysis, mutual information and DNA random walks have been used to probe long-range order via-long range correlations. This work attempts to show the advantage of using the Information Theoretic measure of mutual information for this purpose. A number Mu is found which indicates the existence of long-range order. Mu is the ratio between the value of mutual information function between two nucleotides of a DNA sequence separated by a large distance of 100 kilobases to the value expected from a randomized sequence of the same DNA. It is found that in spite of the constant shuffling of nucleotides due to insertion, deletion, inversion and recombination that occur during evolution, the chromosomal structure of prokaryotes is not always mosaic. While all archaeal chromosomes show mosaic structure and lack long-range order, a sizable fraction of the bacterial chromosomes do possess long-range order. A statistical multivariate analysis has been done to find which of the physical variables like genome size or GC% affects the organization of the chromosome or correlates with the long-range order. The existence of long-range order in bacterial chromosomes could be directly correlated to the degree of gene strand bias shown by it. Firmicutes which have low GC content also have pronounced strand bias and show long-range correlations. It is observed that the occurrence of long-range order in bacteria is independent of genome size, but depends on its GC content and gene strand bias.  相似文献   

2.
Clay O  Carels N  Douady C  Macaya G  Bernardi G 《Gene》2001,276(1-2):15-24
GC level distributions of a species' nuclear genome, or of its compositional fractions, encode key information on structural and functional properties of the genome and on its evolution. They can be calculated either from absorbance profiles of the DNA in CsCl density gradients at sedimentation equilibrium, or by scanning long contigs of largely sequenced genomes. In the present study, we address the quantitative characterization of the compositional heterogeneity of genomes, as measured by the GC distributions of fixed-length fragments. Special attention is given to mammalian genomes, since their compartmentalization into isochores implies two levels of heterogeneity, intra-isochore (local) and inter-isochore (global). This partitioning is a natural one, since large-scale compositional properties vary much more among isochores than within them. Intra-isochore GC distributions become roughly Gaussian for long fragments, and their standard deviations decrease only slowly with increasing fragment length, unlike random sequences. This effect can be explained by 'long-range' correlations, often overlooked, that are present along isochores.  相似文献   

3.
We propose a model for generating "artificial" nucleotide sequences and, by the method of mapping those sequences onto a "DNA-walk," we analyze the presence of correlation between nucleotides. Artificial sequences are constructed considering, basically, interactions between first neighbors and between more distant units. We show that long-range correlations may be favored by the occurrence of intrastrand interactions, which give a nonlinear characteristic to the sequence.  相似文献   

4.
Melting curves are calculated for infinitely long DNA-like random copolymers composed of AT and GC pairs of nucleotides. The entropy of random coil rings formed on melting is explicitly included through use of the Jacobson-Stockmayer ring-weighting factors. Transition curves are calculated for values of the cooperativity parameter σ in the range 10?2 ? σ ? 10?4. Ninety percent of the melting occurs in ca. 0.2°C for σ ? 10?3 regardless of the mole fraction of GC. We conclude that observed breadths of thermal denaturation curves for native DNAs result from a superposition of essentially all-or-none melting of various regions of the molecule. It is argued that refined approximations to the ring-weighting factors are probably not important when compared with the effects produced by long-range base sequence correlations which are known to occur in native DNA.  相似文献   

5.
Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.  相似文献   

6.
Hepatitis C virus (HCV) is a positive-sense RNA virus approximately 9600 bases long. An internal ribosomal entry site (IRES) spans the 5' nontranslated region, which is the most conserved and highly structured region of the HCV genome. In this study, we demonstrate that nucleotides 428-442 of the HCV core-coding sequence anneal to nucleotides 24-38 of the 5'NTR, and that this RNA-RNA interaction modulates IRES-dependent translation in rabbit reticulocyte lysate and in HepG2 cells. The inclusion of the core-coding sequence (nucleotides 428-442) significantly suppressed the translational efficiency directed by HCV IRES in dicistronic reporter systems, and this suppression was relieved by site-directed mutations that blocked the long-range interaction between nucleotides 24-38 and 428-442. These findings suggest that the long-range interaction between the HCV 5'NTR and the core-coding nucleotide sequence down-regulate cap-independent translation via HCV IRES. The modulation of protein synthesis by long-range RNA-RNA interaction may play a role in the regulation of viral gene expression.  相似文献   

7.
A map encompassing 300 kilobases (kb) in and around the human alpha-globin gene complex shows features with important implications for understanding the structure and function of the human genome. In contrast to other segments of the mammalian genome that have been analysed by pulsed field gradient electrophoresis (PFGE), this region contains an unusually high density of sites for infrequently cutting restriction enzymes that recognise GC rich motifs including the under-represented CpG doublet. This suggests that the 26 kilobase (kb) stretch of DNA containing the alpha-globin gene family, which is known from sequence analysis to be 60% GC rich, is itself embedded within a region of high GC content. This long-range structure, identified by PFGE, corresponds to a class of GC rich isochores that are thought to represent early replicating DNA present in Giemsa negative chromosomal bands. The identification of such regions by PFGE will be of value in understanding the organisation of human chromosomes and will influence the strategies used to construct a physical map of the genome.  相似文献   

8.
Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.  相似文献   

9.
10.
11.
The compositional properties of human genes   总被引:8,自引:0,他引:8  
Summary The present work represents the first attempt to study in greater detail previously proposed compositional correlations in genomes, based on a body of additional data relating to gene localizations as well as to extended flanking sequences extracted from gene banks. We have investigated the correlations that exist between (1) the GC levels of exons of human genes, and (2) the GC levels of either intergenic sequences or introns associated with the genes under consideration. In both cases, linear relationships with slopes close to unity were found. The similarity of the linear relationships indicates similar GC levels in intergenic sequences and introns located in the same isochores. Moreover, both intergenic sequences and introns showed GC levels 5–10% lower than the corresponding exons. The above findings considerably strengthen the previously drawn conclusion that coding and noncoding sequences (both inter- and intragenic) from the same isochores of the human genome are compositionally correlated. In addition, we find linear correlations between the GC levels of codon positions and of the intergenic sequences or introns associated with the corresponding genes, as well as among the GC levels of codon positions of genes.  相似文献   

12.
We present a theory for cooperative chiral order in the transition between right-handed B-DNA and left-handed Z-DNA. This theory, based on the random-field Ising model, predicts the characteristic length scale of Z-DNA segments. This length scale depends on whether the DNA is a homopolymer or a random sequence: it is approximately 4000 nucleotides in a homopolymer but only approximately 25 nucleotides in a random sequence. These theoretical results are consistent with experiments on DNA homopolymers and random sequences.  相似文献   

13.
T Bettecken  B Aissani  C R Müller  G Bernardi 《Gene》1992,122(2):329-335
The genomes of warm-blooded vertebrates are mosaics of long DNA segments (> 300 kb, on the average), the isochores, homogeneous in GC levels, which belong to a small number of compositional families. In the present work, the human dystrophin-encoding gene, spanning more than 2.3 Mb in Giemsa band Xp21 (on the short arm of the X chromosome), was analyzed in its isochore organization by hybridizing cDNA probes, corresponding to eight contiguous segments of the coding sequence, on compositional fractions from human DNA. Five DNA regions of uniform (+/- 0.5%) GC content, separated by compositional discontinuities of about 2% GC, were found, so providing the first high-resolution compositional map obtained for a human genome locus and the first direct estimate of isochore size (360 kb to more than 770 kb, in the locus under consideration). One of the isochores contains 71% and another one 21% of deletion breakpoints found in patients suffering from Duchenne's and Becker's muscular dystrophies.  相似文献   

14.
Patchwork structure of a bovine satellite DNA   总被引:25,自引:0,他引:25  
M Pech  R E Streeck  H G Zachau 《Cell》1979,18(3):883-893
According to a previous restriction nuclease analysis, bovine 1.706 satellite DNA (density 1.706 g/cm3 in CsCl) is organized in an unusual structure of superimposed long- and short-range repeats (Streeck and Zachau, 1978). We have now determined the nucleotide sequence of this satellite DNA in both cloned fragments and fragments from the total satellite DNA. Each long-range repeat unit (about 2350 bp) is divided into four segments. Each segment consists of different variants of a basic 23 bp sequence which is itself composed of a dodecanucleotide and a related undecanucleotide. A total of 2400 nucleotides have been sequenced. Detailed analysis of the sequence divergence reveals that both the overall extent of divergence and the frequency of base changes at individual positions of the 23 bp repeats are characteristically different in the various segments. Preferentially methylated sites and a high incidence of symmetry elements are found. In two of the four segments, 22 of 23 bp of the prototype sequence are included in six overlapping elements of dyad symmetry and in a palindrome. A scheme for the evolution of the satellite DNA from a basic dodecanucleotide is proposed which is based on the different degrees of divergence for the various repeats superimposed in this satellite DNA.  相似文献   

15.
16.
P Early  H Huang  M Davis  K Calame  L Hood 《Cell》1980,19(4):981-992
We have determined the sequences of separate germline genetic elements which encode two parts of a mouse immunglobulin heavy chain variable region. These elements, termed gene segments, are heavy chain counterparts of the variable (V) and joining (J) gene segments of immunoglobulin light chains. The VH gene segment encodes amino acids 1-101 and the JH gene segment encodes amino acids 107-123 of the S107 phosphorylcholine-binding VH region. This JH gene segment and two other JH gene segments are located 5' to the mu constant region gene (Cmu) in germline DNA. We have also determined the sequence of a rearranged VH gene encoding a complete VH region, M603, which is closely related to S107. In addition, we have partially determined the VH coding sequences of the S107 and M167 heavy chain mRNAs. By comparing these sequences to the germline gene segments, we conclude that the germline VH and JH gene segments do not contain at least 13 nucleotides which are present in the rearranged VH genes. In S107, these nucleotides encode amino acids 102-106, which form part of the third hypervariable region and consequently influence the antigen-binding specificity of the immunoglobulin molecule. This portion of the variable region may be encoded by a separate germline gene segment which can be joined to the VH and JH gene segments. We term this postulated genetic element the D gene segment, referring to its role in the generation of heavy chain diversity. Essentially the same noncoding sequences are found 3' to the VH gene segment and as inverse complements 5' to two JH gene segments. These are the same conserved nucleotides previously found adjacent to light chain V and J gene segments. Each conserved sequence consists of blocks of seven and ten conserved nucleotides which are separated by a spacer of either 11 or 22 nonconserved nucleotides. The highly conserved spacing, corresponding to one or two turns of the DNA helix, maintains precise spatial orientations between blocks of conserved nucleotides. Gene segments which can join to one another (VK and JK, for example) always have spacers of different lengths. Based on these observations, we propose a model for variable region gene rearrangement mediated by proteins which recognize the same conserved sequences adjacent to both light and heavy chain immunoglobulin gene segments.  相似文献   

17.
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.  相似文献   

18.
The primary structure of 28S ribosomal RNA constitutes a conserved core which is similar among most 23S-like rRNAs and expansion segments which occur at specific positions in the sequence. The expansion segments account for most of the size difference between prokaryotic (archaeal and eubacterial) and eukaryotic rRNAs and they exhibit a sequence variation which is unique among rRNAs. We have investigated the sequence variation of one of the expansion segments, V8, by sequencing a total of 111 V8 segments from 9 different human cell lines and tissues and have found 35 different variants. The variation occur mainly at two 'hot spots' which are separated by 170 nucleotides in the primary sequence but are neighbours in the secondary structure. The sequence of V8 segments varies both within and between human cell lines and tissues. The implications for the evolution of the eukaryotic 28S rRNA are discussed together with possible functions of the expansion segments. We also present a secondary structure model for the V8 segment based on comparative sequence analysis and chemical and enzymatic foot printing.  相似文献   

19.
DNA microarrays have been widely adopted by the scientific community for a variety of applications. To improve the performance of microarrays there is a need for a fundamental understanding of the interplay between the various factors that affect microarray sensitivity and specificity. We use lattice Monte Carlo simulations to study the thermodynamics and kinetics of hybridization of single-stranded target genes in solution with complementary probe DNA molecules immobilized on a microarray surface. The target molecules in our system contain 48 segments and the probes tethered on a hard surface contain 8-24 segments. The segments on the probe and target are distinct and each segment represents a sequence of nucleotides ( approximately 11 nucleotides). Each probe segment interacts exclusively with its unique complementary target segment with a single hybridization energy; all other interactions are zero. We examine how the probe length, temperature, or hybridization energy, and the stretch along the target that the probe segments complement, affect the extent of hybridization. For systems containing single probe and single target molecules, we observe that as the probe length increases, the probability of binding all probe segments to the target decreases, implying that the specificity decreases. We observe that probes 12-16 segments ( approximately 132-176 nucleotides) long gave the highest specificity and sensitivity. This agrees with the experimental results obtained by another research group, who found an optimal probe length of 150 nucleotides. As the hybridization energy increases, the longer probes are able to bind all their segments to the target, thus improving their specificity. The hybridization kinetics reveals that the segments at the ends of the probe are most likely to start the hybridization. The segments toward the center of the probe remain bound to the target for a longer time than the segments at the ends of the probe.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号