首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
5.
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA ‘word-sizes’ and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.  相似文献   

6.
7.
To study possible relationships between an organism's genomic DNA curvature and the aminoacid composition of its proteome, every peptidic sequence from fully determined genomes was retrotranslated using the E. coli codon preferences, and the curvature profiles of the resulting DNA sequences were calculated and compared. A clear interdependence between these two variables was observed, as each retrotranslated proteome presented a distinctive, statistically significant DNA curvature profile biased toward its natural DNA curvature profile. In addition, by comparing the profiles arising from real and randomly permuted proteomes, we also found a position-dependent contribution of the peptidic sequence to DNA curvature. The implications of these results support the idea of a possible selection toward a specific global curvature of genomes.  相似文献   

8.
Centromere parC of plasmid R1 is curved   总被引:2,自引:1,他引:1  
The centromere sequence parC of Escherichia coli low-copy-number plasmid R1 consists of two sets of 11 bp iterated sequences. Here we analysed the intrinsic sequence-directed curvature of parC by its migration anomaly in polyacrylamide gels. The 159 bp long parC is strongly curved with anomaly values (k-factors) close to 2. The properties of the parC curvature agree with those of other curved DNA sequences. parC contains two regions of 5-fold repeated iterons separated by 39 bp. We modified 4 bp within this intermediate sequence so that we could analyse the two 5-fold repeated regions independently. The analysis shows that the two repeat regions are not independently curved parts of parC but that the overall curvature is a property of the whole fragment. Since the centromere sequence of an E.coli plasmid as well as eukaryotic centromere sequences show DNA curvature, we speculate that curvature might be a general property of centromeres.  相似文献   

9.
10.
We aligned and analyzed 100 pairs of complete, orthologous intergenic regions from the human and mouse genomes (average length approximately 12 000 nucleotides). The alignments alternate between highly similar segments and dissimilar segments, indicating a wide variation of selective constraint. The average number of selectively constrained nucleotides within a mammalian intergenic region is at least 2000. This is threefold higher than within a nematode intergenic region and at least twofold higher than the number of selectively constrained nucleotides coding for an average protein. Because mammals possess only two- to threefold more proteins than Caenorhabditis elegans, the higher complexity of mammals might be primarily because of the functioning of intergenic DNA.  相似文献   

11.
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime ( 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2].  相似文献   

12.
Eukaryotes and archaea both possess multiple genes coding for family B DNA polymerases. In animals and fungi, three family B DNA polymerases, alpha, delta, and epsilon, are responsible for replication of nuclear DNA. We used a PCR-based approach to amplify and sequence phylogenetically conserved regions of these three DNA polymerases from Giardia intestinalis and Trichomonas vaginalis, representatives of early-diverging eukaryotic lineages. Phylogenetic analysis of eukaryotic and archaeal paralogs suggests that the gene duplications that gave rise to the three replicative paralogs occurred before the divergence of the earliest eukaryotic lineages, and that all eukaryotes are likely to possess these paralogs. One eukaryotic paralog, epsilon, consistently branches within archaeal sequences to the exclusion of other eukaryotic paralogs, suggesting that an epsilon-like family B DNA polymerase was ancestral to both archaea and eukaryotes. Because crenarchaeote and euryarchaeote paralogs do not form monophyletic groups in phylogenetic analysis, it is possible that archaeal family B paralogs themselves evolved by a series of gene duplications independent of the gene duplications that gave rise to eukaryotic paralogs.   相似文献   

13.
Summary In a previous publication it was shown that the output of yeast mitochondrial loci lacking nearby intergenic sequences (encompassing ori/rep elements) was reduced in crosses to strains with wild-type mtDNAs. In the present work, mitochondrial genomes carrying the intergenic deletions were marked at unlinked, loci by introducing specific antibiotic resistance mutations against erythromycin, oligomycin and paromomycin. These marked genomes were used to follow the output of unlinked regions of the genome from crosses between the intergenic deletion mutants and wild-type strains. Transmission of genetically unlinked markers in coding regions was substantially reduced when an intergenic deletion was present on the same genome. In general the transmission of the antibiotic markers was the same as or slightly higher than the corresponding intergenic marker. These results indicate that the presence of an intergenic deletion in the regions studied impairs the transmission to progeny of a mitochondrial genome as a whole. More specifically, the results suggest that ori/rep sequences, present in the regions that have been deleted, confer a competitive advantage over genomes lacking a full complement of such sequences. These results support the hypothesis that intergenic sequences, and specifically ori/rep elements, have a biological role in the mitochondrial genome. However, because of the exclusive presence of ori/rep sequences in the genus Saccharomyces, it may be that these sequences evolved in (or invaded) the mitochondrial genome relatively late in the evolution of the yeasts. Therefore, in a more general sense, variations in the amount and structure of intergenic sequences in various yeasts may reflect processes that have been of selective advantage in the metabolism of individual mitochondrial DNA in a particular environment and that have not drastically interrupted the respiratory phenotype.  相似文献   

14.
BLAST (Basic Local Alignment Search Tool) searches against DNA and protein sequence databases have become an indispensable tool for biomedical research. The proliferation of the genome sequencing projects is steadily increasing the fraction of genome-derived sequences in the public databases and their importance as a public resource. We report here the availability of Genomic BLAST, a novel graphical tool for simplifying BLAST searches against complete and unfinished genome sequences. This tool allows the user to compare the query sequence against a virtual database of DNA and/or protein sequences from a selected group of organisms with finished or unfinished genomes. The organisms for such a database can be selected using either a graphic taxonomy-based tree or an alphabetical list of organism-specific sequences. The first option is designed to help explore the evolutionary relationships among organisms within a certain taxonomy group when performing BLAST searches. The use of an alphabetical list allows the user to perform a more elaborate set of selections, assembling any given number of organism-specific databases from unfinished or complete genomes. This tool, available at the NCBI web site http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi, currently provides access to over 170 bacterial and archaeal genomes and over 40 eukaryotic genomes.  相似文献   

15.
We compared levels of sequence divergence between fourfold synonymous coding sites and noncoding sites from the intergenic and intronic regions of the Plasmodium falciparum and Plasmodium reichenowi genomes. We observed significant differences in the level of divergence between these classes of silent sites. Fourfold synonymous coding sites exhibited the highest level of sequence divergence, followed by introns, and then intergenic sequences. This pattern of relative divergence rates has been observed in primate genomes but was unexpected in Plasmodium due to a paucity of variation at silent sites in P. falciparum and the corollary hypothesis that silent sites in this genome may be subject to atypical selective constraints. Exclusion of hypermutable CpG dinucleotides reduces the divergence level of synonymous coding sites to that of intergenic sites but does not diminish the significantly higher divergence level of introns relative to intergenic sites. A greater than expected incidence of CpG dinucleotides in intergenic regions less than 500 bp from genes may indicate selective maintenance of regulatory motifs containing CpGs. Divergence rates of different classes of silent sites in these Plasmodium genomes are determined by a combination of mutational and selective pressures.  相似文献   

16.
Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. Repeat analyses identified 33–45 direct and inverted repeats ≥30 bp with a sequence identity of at least 90%; all but five of the repeats shared by all four Solanaceae genomes are located in the same genes or intergenic regions, suggesting a functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in chloroplast genomes. Only four spacer regions are fully conserved (100% sequence identity) among all genomes; deletions or insertions within some intergenic spacer regions result in less than 25% sequence identity, underscoring the importance of choosing appropriate intergenic spacers for plastid transformation and providing valuable new information for phylogenetic utility of the chloroplast intergenic spacer regions. Comparison of coding sequences with expressed sequence tags showed considerable amount of variation, resulting in amino acid changes; none of the C-to-U conversions observed in potato and tomato were conserved in tobacco and Atropa. It is possible that there has been a loss of conserved editing sites in potato and tomato.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

17.
18.
Prokaryotic genomes are considered to be 'wall-to-wall' genomes, which consist largely of genes for proteins and structural RNAs, with only a small fraction of the genomic DNA allotted to intergenic regions, which are thought to typically contain regulatory signals. The majority of bacterial and archaeal genomes contain 6-14% non-coding DNA. Significant positive correlations were detected between the fraction of non-coding DNA and inter- and intra-operonic distances, suggesting that different classes of non-coding DNA evolve congruently. In contrast, no correlation was found between any of these characteristics of non-coding sequences and the number of genes or genome size. Thus, the non-coding regions and the gene sets in prokaryotes seem to evolve in different regimes. The evolution of non-coding regions appears to be determined primarily by the selective pressure to minimize the amount of non-functional DNA, while maintaining essential regulatory signals, because of which the content of non-coding DNA in different genomes is relatively uniform and intra- and inter-operonic non-coding regions evolve congruently. In contrast, the gene set is optimized for the particular environmental niche of the given microbe, which results in the lack of correlation between the gene number and the characteristics of non-coding regions.  相似文献   

19.
In recent years, various families of small non-coding RNAs (sRNAs) have been discovered by experimental and computational approaches, both in bacterial and eukaryotic genomes. Although most of them await elucidation of their function, it has been reported that some play important roles in gene regulation. Here we carried out comparative genomics analysis of possible sRNAs that are computationally identified in 30 bacterial genomes from gamma- and alpha-proteobacteria and Deinococcus radiodurans. Identified sRNAs are clustered by a complete-linkage clustering method to see conservation among the organisms. On average, sRNAs are found in approximately 30% of intergenic regions of each genome sequence. Of these, 25.7% are conserved among three or more organisms. Approximately 60% of the conserved sRNAs do not locate in orthologous intergenic regions, implying that sRNAs may be shuffled their positions in genomes. The current study implies that sRNAs may be involved in a more extensive range of functions in bacteria.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号