首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Chargaff''s rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov–Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.  相似文献   

2.
Forsdyke  Donald R. 《Genetica》2021,149(2):81-88

Of Chargaff's four "rules" on DNA base frequencies, the functional interpretation of his second parity rule (PR2) is the most contentious. Thermophile base compositions (GC%) were taken by Galtier and Lobry (1997) as favoring Sueoka's neutral PR2 hypothesis over Forsdyke's selective PR2 hypothesis, namely that mutations improving local within-species recombination efficiency had generated a genome-wide potential for the strands of duplex DNA to separate and initiate recombination through the "kissing" of the tips of stem-loops. However, following Chargaff's GC rule, base composition mainly reflects a species-specific, genome-wide, evolutionary pressure. GC% could not have consistently followed the dictates of temperature, since it plays fundamental roles in both sustaining species integrity and, through primarily neutral genome-wide mutation, fostering speciation. Evidence for a local within-species recombination-initiating role of base order was obtained with a novel technology that masked the contribution of base composition to nucleic acid folding energy. Forsdyke's results were consistent with his PR2 hypothesis, appeared to resolve some root problems in biology and provided a theoretical underpinning for alignment-free taxonomic analyses using relative oligonucleotide frequencies (k-mer analysis). Moreover, consistent with Chargaff's cluster rule, discovery of the thermoadaptive role of the "purine-loading" of open reading frames made less tenable the Galtier-Lobry anti-selectionist arguments.

  相似文献   

3.
Our studies on the bases of codons from 11 completely sequenced archaeal genomes show that, as we move from GC-rich to AT-rich protein-coding gene-containing species, the differences between G and C and between A and T, the purine load (AG content), and also the overall persistence (i.e. the tendency of a base to be followed by the same base) within codons, all increase almost simultaneously, although the extent of increase is different over the three positions within codons. These findings suggest that the deviations from the second parity rule (through the increasing differences between complementary base contents) and the increasing purine load hinder the chance of formation of the intra-strand Watson-Crick base-paired secondary structures in mRNAs (synonymous with the protein-coding genes we dealt with), thereby increasing the translational efficiency. We hypothesize that the ATrich protein-coding gene-containing archaeal species might have better translational efficiency than their GC-rich counterparts.  相似文献   

4.
Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.  相似文献   

5.
Chargaff’s second parity rule for short oligonucleotides states that the frequency of any short nucleotide sequence on a strand is approximately equal to the frequency of its reverse complement on the same strand. Recent studies have shown that, with the exception of organellar DNA, this parity rule generally holds for double-stranded DNA genomes and fails to hold for single-stranded genomes. While Chargaff’s first parity rule is fully explained by the Watson–Crick pairing in the DNA double helix, a definitive explanation for the second parity rule has not yet been determined. In this work, we propose a model based on a hidden Markov process for approximating the distributional structure of primitive DNA sequences. Then, we use the model to provide another possible theoretical explanation for Chargaff’s second parity rule, and to predict novel distributional aspects of bacterial DNA sequences.  相似文献   

6.
7.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

8.
Haldane's rule is an empirical phenomenon that has been observed in animals with sex chromosomes. The rule states that the heterogametic sex (XY or ZW) will be “absent, rare, or sterile” following hybridization between two species. Despite the near ubiquity of Haldane's rule in animal hybridizations, it has not been documented in organisms other than animals. Here, we show evidence for both rarity and sterility in hybrid male but not female offspring in crosses between three dioecious plant species from the genus Silene with heteromorphic (XY) sex chromosomes. Our results are consistent with Haldane's rule, extending its applicability to plants with sex chromosomes.  相似文献   

9.
For almost 50 years the conclusive explanation of Chargaff’s second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson–Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson–Crick base pairing generates CSPR. We demonstrate quadruplet’s symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These “landscapes” are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1–12, and X, Y the “landscapes” are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.  相似文献   

10.
Locey KJ  White EP 《PloS one》2011,6(2):e14651

Background

The study of large-scale genome structure has revealed patterns suggesting the influence of evolutionary constraints on genome evolution. However, the results of these studies can be difficult to interpret due to the conceptual complexity of the analyses. This makes it difficult to understand how observed statistical patterns relate to the physical distribution of genomic elements. We use a simpler and more intuitive approach to evaluate patterns of genome structure.

Methodology/Principal Findings

We used randomization tests based on Morisita''s Index of aggregation to examine average differences in the distribution of purines and pyrimidines among coding and noncoding regions of 261 chromosomes from 223 microbial genomes representing 21 phylum level groups. Purines and pyrimidines were aggregated in the noncoding DNA of 86% of genomes, but were only aggregated in the coding regions of 52% of genomes. Coding and noncoding DNA differed in aggregation in 94% of genomes. Noncoding regions were more aggregated than coding regions in 91% of these genomes. Genome length appears to limit aggregation, but chromosome length does not. Chromosomes from the same species are similarly aggregated despite substantial differences in length. Aggregation differed among taxonomic groups, revealing support for a previously reported pattern relating genome structure to environmental conditions.

Conclusions/Significance

Our approach revealed several patterns of genome structure among different types of DNA, different chromosomes of the same genome, and among different taxonomic groups. Similarity in aggregation among chromosomes of varying length from the same genome suggests that individual chromosome structure has not evolved independently of the general constraints on genome structure as a whole. These patterns were detected using simple and readily interpretable methods commonly used in other areas of biology.  相似文献   

11.
In 1968, Chargaff and his colleagues discovered a rule in Bacillus subtilis: in single stranded DNA, A=T and C=G. This rule has since been confirmed many times in other bacterial and eukaryotic genomes. To the best of our knowledge, this rule has not been tested before in either single stranded DNA or RNA genomes. Over 3400 genomic sequences were examined here and included for the first time both double and single stranded DNA and RNA genomes. We found that: (1) with the exception of the organellar DNA, this parity rule holds for all types of double stranded DNA genomes and (2) that this rule fails to hold for other types of genomes. The parity rule appears to be a selective force on genome evolution and codon use.  相似文献   

12.
MOTIVATION: Over sufficiently long windows, complementary strands of DNA tend to have the same base composition. A few reports have indicated that this first-order parity rule extends at higher orders to oligonucleotide composition, at least in some organisms or taxa. However, the scientific literature falls short of providing a comprehensive study of reverse-complement symmetry at multiple orders and across the kingdom of life. It also lacks a characterization of this symmetry and a convincing explanation or clarification of its origin. RESULTS: We develop methods to measure and characterize symmetry at multiple orders, and analyze a wide set of genomes, encompassing single- and double-stranded RNA and DNA viruses, bacteria, archae, mitochondria, and eukaryota. We quantify symmetry at orders 1 to 9 for contiguous sequences and pools of coding and non-coding upstream regions, compare the observed symmetry levels to those predicted by simple statistical models, and factor out the effect of lower-order distributions. We establish the universality and variability range of first-order strand symmetry, as well as of its higher-order extensions, and demonstrate the existence of genuine high-order symmetric constraints. We show that ubiquitous reverse-complement symmetry does not result from a single cause, such as point mutation or recombination, but rather emerges from the combined effects of a wide spectrum of mechanisms operating at multiple orders and length scales.  相似文献   

13.
Chargaff's first parity rule (%A=%T and %G=%C) is explained by the Watson-Crick model for duplex DNA in which complementary base pairs form individual accounting units. Chargaff's second parity rule is that the first rule also applies to single strands of DNA. The limits of accounting units in single strands were examined by moving windows of various sizes along sequences and counting the relative proportions of A and T (the W bases), and of C and G (the S bases). Shuffled sequences account, on average, over shorter regions than the corresponding natural sequence. For an E. coli segment, S base accounting is, on average, contained within a region of 10 kb, whereas W base accounting requires regions in excess of 100 kb. Accounting requires the entire genome (190 kb) in the case of Vaccinia virus, which has an overall "Chargaff difference" of only 0.086% (i.e. only one in 1162 bases does not have a potential pairing partner in the same strand). Among the chromosomes of Saccharomyces cerevisiae, the total Chargaff differences for the W bases and for the S bases are usually correlated. In general, Chargaff differences for a natural sequence and its shuffled counterpart diverge maximally when 1 kb sequence windows are employed. This should be the optimum window size for examining correlations between Chargaff differences and sequence features which have arisen through natural selection. We propose that Chargaff's second parity rule reflects the evolution of genome-wide stem-loop potential as part of short- and long-range accounting processes which work together to sustain the integrity of various levels of information in DNA.  相似文献   

14.
The genomes of eukaryotes are mosaics of isochores. These are long DNA stretches that are fairly homogeneous in base composition and that belong to a small number of families characterized by different ratios of GC to AT and different short-sequence patterns (i.e., different DNA structures that interact with different proteins). This genome organization led to two discoveries: (1) the genomic code, which refers to two correlations, that of the composition of coding and contiguous noncoding sequences, and that of coding sequences and the structural properties of the encoded proteins; and (2) the genome phenotypes, which correspond to the patterns of isochore families in the genomes. These patterns indicate that genome evolution may proceed either according to a conservative mode or to a transitional (isochore shifting) mode, apparently depending upon whether the environment is constant or shifting. According to the neoselectionist theory, natural selection is responsible for both modes.  相似文献   

15.
Carels N  Bernardi G 《FEBS letters》2000,472(2-3):302-306
The base composition patterns of genes, coding sequences and gene expression levels were analyzed in the available long sequences (contigs) of Arabidopsis. Chromosome 5 was analyzed in detail and all chromosomes for which sequence data are now available show essentially the same large-scale compositional properties. Guanine+cytosine levels of genes and of their coding regions, as well as gene densities and expression levels, all show a marked tendency to be higher in the distal regions of Arabidopsis chromosomes.  相似文献   

16.
While veritable oceans of ink have been spilled over the base distributions within genes, the literature is virtually silent on large scale intra genomic base distribution. To address this issue, we have examined approximately 3400 chromosomal sequences from approximately 2000 entire genomes-including DNA and RNA, single- and double-stranded, coding and non-coding genomes. For each sequence the mean, variance, skewness, and kurtosis for each base were computed along with the genome base composition. The main findings are: (1) there is no simple relationship between these statistics and the base composition of the genome, (2) in non-viral genomes, base distribution is non-uniform, (3) base distribution in non-eukaryotic genomes obeys a number of simple rules, (4) these rules are not dependent on the presence of coding sequences, (5) bacterial genomes in particular are unusually compliant with these rules, and (6) eukaryotes have a unique pattern of base distribution.  相似文献   

17.
In this paper, we highlight the affinity between the genomes of key representatives of the Pooideae subfamily, revealed at the chromosomal level by genomic in situ hybridization (GISH). The analyses were conducted using labeled probes from each species to hybridize with chromosomes of every species used in this study based on a “round robin” rule. As a result, the whole chromosomes or chromosome regions were distinguished or variable types of signals were visualized to prove the different levels of the relationships between genomes used in this study. We observed the unexpected lack of signals in secondary constrictions of rye (RR) chromosomes probed by triticale (AABBRR) genomic DNA. We have also identified unlabeled chromosome regions, which point to species-specific sequences connected with disparate pathways of chromosome differentiation. Our results revealed a conservative character of coding sequence of 35S rDNA among selected species of the genera Aegilops, Brachypodium, Festuca, Hordeum, Lolium, Secale, and Triticum. In summary, we showed strong relationships in genomic DNA sequences between species which have been previously reported to be phylogenetically distant.  相似文献   

18.
Plant genomes are complex and contain large amounts of repetitive DNA including microsatellites that are distributed across entire genomes. Whole genome sequences of several monocot and dicot plants that are available in the public domain provide an opportunity to study the origin, distribution and evolution of microsatellites, and also facilitate the development of new molecular markers. In the present investigation, a genome-wide analysis of microsatellite distribution in monocots (Brachypodium, sorghum and rice) and dicots (Arabidopsis, Medicago and Populus) was performed. A total of 797,863 simple sequence repeats (SSRs) were identified in the whole genome sequences of six plant species. Characterization of these SSRs revealed that mono-nucleotide repeats were the most abundant repeats, and that the frequency of repeats decreased with increase in motif length both in monocots and dicots. However, the frequency of SSRs was higher in dicots than in monocots both for nuclear and chloroplast genomes. Interestingly, GC-rich repeats were the dominant repeats only in monocots, with the majority of them being present in the coding region. These coding GC-rich repeats were found to be involved in different biological processes, predominantly binding activities. In addition, a set of 22,879 SSR markers that were validated by e-PCR were developed and mapped on different chromosomes in Brachypodium for the first time, with a frequency of 101 SSR markers per Mb. Experimental validation of 55 markers showed successful amplification of 80% SSR markers in 16 Brachypodium accessions. An online database 'BraMi' (Brachypodium microsatellite markers) of these genome-wide SSR markers was developed and made available in the public domain. The observed differential patterns of SSR marker distribution would be useful for studying microsatellite evolution in a monocot-dicot system. SSR markers developed in this study would be helpful for genomic studies in Brachypodium and related grass species, especially for the map based cloning of the candidate gene(s).  相似文献   

19.
Complete chromosome/genome sequences available from humans, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae were analyzed for the occurrence of mono-, di-, tri-, and tetranucleotide repeats. In all of the genomes studied, dinucleotide repeat stretches tended to be longer than other repeats. Additionally, tetranucleotide repeats in humans and trinucleotide repeats in Drosophila also seemed to be longer. Although the trends for different repeats are similar between different chromosomes within a genome, the density of repeats may vary between different chromosomes of the same species. The abundance or rarity of various di- and trinucleotide repeats in different genomes cannot be explained by nucleotide composition of a sequence or potential of repeated motifs to form alternative DNA structures. This suggests that in addition to nucleotide composition of repeat motifs, characteristic DNA replication/repair/recombination machinery might play an important role in the genesis of repeats. Moreover, analysis of complete genome coding DNA sequences of Drosophila, C. elegans, and yeast indicated that expansions of codon repeats corresponding to small hydrophilic amino acids are tolerated more, while strong selection pressures probably eliminate codon repeats encoding hydrophobic and basic amino acids. The locations and sequences of all of the repeat loci detected in genome sequences and coding DNA sequences are available at http://www.ncl-india.org/ssr and could be useful for further studies.  相似文献   

20.

Background

Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.

Results

Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.

Conclusions

While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号