首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Static DNA curvature distributions of full-sequenced genomes and large DNA contigs from different organisms were calculated. Very distinctive differences among histogram profiles coming from archaebacteria, eubacteria, and eukaryotes were observed. Eubacterial profiles were, on average, more curved than were archaeal and eukaryotic profiles. A comparative analysis between real and randomized DNA sequences revealed that eubacterial genomes presented, overall, higher curvature values than random sequences. An opposite portrait was exhibited by archaeal and eukaryotic genomes. They displayed a lower frequency of curved regions than their corresponding randomized sequences. The contributions of coding and intergenic regions to the curvature profile were also analyzed. Intergenic regions, on average, were found to be more curved than the overall genomic sequences, especially in prokaryotic organisms. Nevertheless, because of their small size with respect to coding regions, the contribution of intergenic sequences to the overall curvature profile tended to be minor. A clear relationship between codon usage and DNA curvature was demonstrated, and a proposal of the possible coevolution of both systems is discussed. Finally, we present a procedure to quantify the deviation of a curvature profile from randomness through a formal statistical analysis.  相似文献   

2.
Summary We have investigated the intragenomic DNA sequence homologies of twelve species of birds representing five orders, and emphasizing Galliformes. This study differs in two important ways from the classical approaches taken in constructing and evaluating phylogenies based on DNA sequence similarities. Comparisons are made on the basis of sequence homologieswithin genomes of related birds, rather than between genomes. DNA is reassociated at 50°C in 0.5M phosphate buffer; these conditions allow formation and detection of duplexes containing more mismatch than would normally be permitted using more stringent conditions, affording an opportunity to observe more ancient sequence homologies. Thermal stability profiles of DNA duplexes formed under these conditions are the basis of comparison; three general patterns were observed. This approach emphasizes differences in sequence composition between genomes while the more traditional method of intergenomic tracer DNA hybridization at higher stringency emphasizes sequence similarities.No correlation was found between taxonomic position and intragenomic sequence composition, either within or between lineages. The thermal stability profiles of DNA duplexes formed within avian genomes did not reflect the biological similarities inferred from morphology, karyotype, and studies of interspecific hybridization. While all of the differences observed could have occurred over geological time, it was surprising that the genomes of the domestic chicken and the Red Jungle Fowl (Gallus gallus) differ in their sequence compositions. It appears that amplification/reduction events and/or positional changes occur rather often during evolution of a lineage.Abbreviations SDS sodium dodecyl sulphate - PB equimolar sodium phosphate buffer pH 6.8 - Cot concentration of DNA in moles of nucleotide per liter times the incubation time in seconds - Equiv. or Equivalent Cot Cot corrected for the monovalent cation concentration effect on re-association rate - HAP hydroxylapatite - Te1/2 temperature at which one-half the DNA has eluted from HAP - SSC 0.15M sodium chloride-0.015M sodium citrate  相似文献   

3.
A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy , a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.  相似文献   

4.
It has been established that the precise positioning of nucleosomes on genomic DNA can be achieved, at least for a minority of them, through sequence-dependent processes. However, to what extent DNA sequences play a role in the positioning of the major part of nucleosomes is still debated. The aim of the present study is to examine to what extent long-range correlations (LRC) are related to the presence of nucleosomes. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. The exploration of a number of eukaryotic and bacterial genomes through the optics of the so-called "wavelet transform microscope" reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. Here, we focus on the existence of LRC in the small-scale regime (10-200 bp) which are actually observed in eukaryotic genomes, in contrast to their absence in eubacterial genomes. Analysis of viral DNA genomes shows that, like their host's genomes, eukaryotic viruses present LRC but eubacterial viruses do not. There is one exception for genomes of poxviruses (Vaccinia and Melamoplus sanguinipes) which do not replicate in the cell nucleus and do not exhibit LRC. No small-scale LRC are detected in the genomes of all examined RNA viruses, with the exception of retroviruses. These results together with the observation of LRC between particular sequence motifs known to participate in the formation of nucleosomes (e.g. AA dinucleotides) strongly suggest that the 10-200 bp LRC are a signature of the sequence-dependence of nucleosome positioning. Finally, we discuss possible interpretations of these LRC in terms of the physical mechanisms that might govern the positioning and the dynamics of the nucleosomes along the DNA chain through cooperative processes.  相似文献   

5.
Until now, the genomic DNA of all eubacteria analyzed has been hyper-curved, its global intrinsic curvature being higher than that of a random sequence. In contrast, that rule failed for archaea or eukaryotes, which could be either hypo- or hyper-curved. The existence of the rule suggested that, at least for eubacteria, global intrinsic curvature is adaptive. However, the present results from analyzing 21 eubacterial and six archaeal genomes argue against adaptation. First, there are two eubacterial exceptions to the former rule. More significantly, we found that the dinucleotide composition of the genome alone (which lacks all sequence information) is enough to determine the genome curvature. Additional evidence against adaptation came from showing that the global curvature of bacterial genomes could not have evolved under either of two complementary models of curvature selection: (i) that curvature is selected locally from unbiased variability; (ii) that curvature is established globally through the selection of a curvature-altering mutational bias. We found that the observed relationship between curvature and dinucleotide composition is incompatible with model (i). We also found that, contrary to the predictions of model (ii), the dinucleotide compositions of bacterial genomes were not statistically special in their curvature-related properties (when compared to stochastically generated dinucleotide compositions).  相似文献   

6.
7.
Abstract

Recently, a highly repetitive DNA sequence family (GRS) from tobacco was described in our laboratory. These sequences were found to be localized predominantly in the pericen-tromeric heterochromatin of tobacco chromosomes. To test the hypothesis that these sequences play an important role in the formation of heterochromatin, we investigated the DNA curvature of the GRS sequences and its possible impact to the chromatin structure at these loci. Application of the nearest-neighbour wedge model of intrinsic DNA curvature for the GRS1 family member predicted two loci of curvature: a major bend at the 5′ end of the sequence and a minor bend of opposite direction at the centre of the GRS1. The presence of the major and the minor loci of DNA curvature was studied experimentally using permutation analysis and site-directed mutagenesis. The experimental results were consistent with the computer predictions. We gave evidence that the described DNA curvature is also present in the entire GRS family. Genomic statistical sequencing showed the conservation of the major bend sequence determinants in the members of the GRS family. To investigate the chromatin structure at the GRS sequences, we determined the nucleosome positioning in vivo at these sequences using thermal cycle primer extension. A relation between the curvature pattern and the histone octamer position was observed: the major bend is excluded from the nucleosome surface to the linker region, while the minor bend is distributed along the core DNA The suggestion is made that the sequences in the minor locus of curvature define the rotational setting of the nucleosome, and a possible role of the major bend as a factor, which defines the translational setting, is discussed.  相似文献   

8.
Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of the DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e. different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms'' habitats, phylogenetic classifications, and other characteristics. Our present work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are over-represented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. Additionally, representations of close direct repeats, palindromes and inverted repeats exhibit clear negative trends with increasing OGT. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches.  相似文献   

9.
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime ( 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2].  相似文献   

10.
MOTIVATION: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution. RESULTS: We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation. AVAILABILITY: The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.  相似文献   

11.
12.
Proteogenomics     
Renuse S  Chaerkady R  Pandey A 《Proteomics》2011,11(4):620-630
The ability to sequence DNA rapidly, inexpensively and in a high-throughput fashion provides a unique opportunity to sequence whole genomes of a large number of species. The cataloging of protein-coding genes from these species, however, remains a non-trivial task with the majority of initial genome annotation dependent on the use of gene prediction algorithms. Recent advances in mass spectrometry-based proteomics now enable generation of accurate and comprehensive protein sequence of tissues and organisms. Proteogenomics allows us to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms. This includes identifying novel genes and splice isoforms, assigning correct start sites and validating predicted exons and genes. It is also possible to use proteogenomics to identify protein variants that could cause diseases, to identify protein biomarkers and to study genome variation. We anticipate proteogenomics to become a powerful approach that will be routinely employed by 'Genome and Proteome Centers' of the future.  相似文献   

13.
Timely classification and identification of bacteria is of vital importance in many areas of public health. We present a mass spectrometry (MS)-based proteomics approach for bacterial classification. In this method, a bacterial proteome database is derived from all potential protein coding open reading frames (ORFs) found in 170 fully sequenced bacterial genomes. Amino acid sequences of tryptic peptides obtained by LC-ESI MS/MS analysis of the digest of bacterial cell extracts are assigned to individual bacterial proteomes in the database. Phylogenetic profiles of these peptides are used to create a matrix of sequence-to-bacterium assignments. These matrixes, viewed as specific assignment bitmaps, are analyzed using statistical tools to reveal the relatedness between a test bacterial sample and the microorganism database. It is shown that, if a sufficient amount of sequence information is obtained from the MS/MS experiments, a bacterial sample can be classified to a strain level by using this proteomics method, leading to its positive identification.  相似文献   

14.
The gene composition of present-day genomes has been shaped by a complicated evolutionary history, resulting in diverse distributions of genes across genomes. The pattern of presence and absence of a gene in different genomes is called its phylogenetic profile. It has been shown that proteins whose encoding genes have highly similar profiles tend to be functionally related: As these genes were gained and lost together, their encoded proteins can probably only perform their full function if both are present. However, a large proportion of genes encoding interacting proteins do not have matching profiles. In this study, we analysed one possible reason for this, namely that phylogenetic profiles can be affected by multi-functional proteins such as shared subunits of two or more protein complexes. We found that by considering triplets of proteins, of which one protein is multi-functional, a large fraction of disturbed co-occurrence patterns can be explained.  相似文献   

15.
The concept of a ‘proteomic constraint’ proposes that the information content of the proteome exerts a selective pressure to reduce mutation rates, implying that larger proteomes produce a greater selective pressure to evolve or maintain DNA repair, resulting in a decrease in mutational load. Here, the distribution of 21 recombination repair genes was characterized across 900 bacterial genomes. Consistent with prediction, the presence of 17 genes correlated with proteome size. Intracellular bacteria were marked by a pervasive absence of recombination repair genes, consistent with their small proteome sizes, but also consistent with alternative explanations that reduced effective population size or lack of recombination may decrease selection pressure. However, when only non-intracellular bacteria were examined, the relationship between proteome size and gene presence was maintained. In addition, the more widely distributed (i.e. conserved) a gene, the smaller the average size of the proteomes from which it was absent. Together, these observations are consistent with the operation of a proteomic constraint on DNA repair. Lastly, a correlation between gene absence and genome AT content was shown, indicating a link between absence of DNA repair and elevated genome AT content.  相似文献   

16.
《Epigenetics》2013,8(10):1329-1338
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.  相似文献   

17.
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.  相似文献   

18.
19.
The ultimate goal of proteomics is to understand complex biological systems. The first step toward this end is the discovery of protein differences by profiling a given proteome. One approach to proteome profiling is to fractionate it into intact proteins, with subsequent identification and quantitation. In this work, lysates of bovine skeletal muscle were prepared. Reproducible proteome profiles were generated by an automatic two-dimensional protein fractionation system. Proteins were separated by isoelectric point and then by hydrophobicity. The data collected from both separations were used to generate proteome profiles. A high protein content fraction with pl above 8.5 was digested with trypsin, and its main protein component was identified as lysozyme C by matrix assisted laser desorption/ionization-time of flight mass spectrometry.  相似文献   

20.
The low molecular weight plasma proteome and its biological relevance are not well defined; therefore, experiments were conducted to directly sequence and identify peptides observed in plasma and serum protein profiles. Protein fractionation, matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) profiling, and liquid-chromatography coupled to MALDI tandem mass spectrometry (MS/MS) sequencing were used to analyze the low molecular weight proteome of heparinized plasma. Four fractionation techniques using functionally derivatized 96-well plates were used to extract peptides from plasma. Tandem TOF was successful for identifying peptides up to m/z 5500 with no prior knowledge of the sequence and was also used to verify the sequence assignments for larger ion signals. The peptides (n>250) sequenced in these profiles came from a surprisingly small number of proteins (n approximately 20), which were all common to plasma, including fibrinogen, complement components, antiproteases, and carrier proteins. The cleavage patterns were consistent with those of known plasma proteases, including initial cleavages by thrombin, plasmin and complement proteins, followed by aminopeptidase and carboxypeptidase activity. On the basis of these data, we discuss limitations in biomarker discovery in the low molecular weight plasma or serum proteome using crude fractionation coupled to MALDI-MS profiling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号