首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Repetitive DNA sequences derived from transposable elements (TE) are distributed in a non-random way, co-clustering with other classes of repeat elements, genes and other genomic components. In a previous work we reported power-law-like size distributions (linearity in log-log scale) in the spatial arrangement of Alu and LINE1 elements in the human genome. Here we investigate the large-scale features of the spatial arrangement of all principal classes of TEs in 14 genomes from phylogenetically distant organisms by studying the size distribution of inter-repeat distances. Power-law-like size distributions are found to be widespread, extending up to several orders of magnitude. In order to understand the emergence of this distributional pattern, we introduce an evolutionary scenario, which includes (i) Insertions of DNA segments (e.g., more recent repeats) into the considered sequence and (ii) Eliminations of members of the studied TE family. In the proposed model we also incorporate the potential for transposition events (characteristic of the DNA transposons' life-cycle) and segmental duplications. Simulations reproduce the main features of the observed size distributions. Furthermore, we investigate the effects of various genomic features on the presence and extent of power-law size distributions including TE class and age, mode of parental TE transmission, GC content, deletion and recombination rates in the studied genomic region, etc. Our observations corroborate the hypothesis that insertions of genomic material and eliminations of repeats are at the basis of power-laws in inter-repeat distances. The existence of these power-laws could facilitate the formation of the recently proposed "fractal globule" for the confined chromatin organization.  相似文献   

2.
We present a model for genome evolution, comprising biologically plausible events such as transpositions inside the genome and insertions of exogenous sequences. This model attempts to formulate a minimal proposition accounting for key statistical properties of genomes, avoiding, as far as possible, unsupportable hypotheses for the remote evolutionary past. The statistical properties that are observed in genomic sequences and are reproduced by the proposed model are: (i) deviations from randomness at different length scales, measured by suitable algorithms, (ii) a special form of size distribution (power law distribution) characterising different levels of genome organisation in the non-coding, and (iii) extensive resemblance in the alternation of coding and non-coding regions at several length scales (self-similarity) in long genomic sequences of higher eukaryotes.  相似文献   

3.
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.  相似文献   

4.
MOTIVATION: The distributions of many genome-associated quantities, including the membership of paralogous gene families can be approximated with power laws. We are interested in developing mathematical models of genome evolution that adequately account for the shape of these distributions and describe the evolutionary dynamics of their formation. RESULTS: We show that simple stochastic models of genome evolution lead to power-law asymptotics of protein domain family size distribution. These models, called Birth, Death and Innovation Models (BDIM), represent a special class of balanced birth-and-death processes, in which domain duplication and deletion rates are asymptotically equal up to the second order. The simplest, linear BDIM shows an excellent fit to the observed distributions of domain family size in diverse prokaryotic and eukaryotic genomes. However, the stochastic version of the linear BDIM explored here predicts that the actual size of large paralogous families is reached on an unrealistically long timescale. We show that introduction of non-linearity, which might be interpreted as interaction of a particular order between individual family members, allows the model to achieve genome evolution rates that are much better compatible with the current estimates of the rates of individual duplication/loss events.  相似文献   

5.
6.
The transfer of organelle DNA fragments to the nuclear genome is frequently observed in eukaryotes. These transfers are thought to play an important role in gene and genome evolution of eukaryotes. In plants, such transfers occur from plastid to nuclear [nuclear plastid DNAs (NUPTs)] and mitochondrial to nuclear (nuclear mitochondrial DNAs) genomes. The amount and genomic organization of organelle DNA fragments have been studied in model plant species, such as Arabidopsis thaliana and rice. At present, publicly available genomic data can be used to conduct such studies in non-model plants. In this study, we analysed the amount and genomic organization of NUPTs in 17 plant species for which genome sequences are available. The amount and distribution of NUPTs varied among the species. We also estimated the distribution of NUPTs according to the time of integration (relative age) by conducting sequence similarity analysis between NUPTs and the plastid genome. The age distributions suggested that the present genomic constitutions of NUPTs could be explained by the combination of the rapidly eliminated deleterious parts and few but constantly existing less deleterious parts.  相似文献   

7.
Second G  Rouhan G 《PloS one》2008,3(7):e2613

Background

The genus Oryza is being used as a model in plant genomic studies although there are several issues still to be resolved regarding the spatio-temporal evolution of this ancient genus. Particularly contentious is whether undated transoceanic natural dispersal or recent human interference has been the principal agent determining its present distribution and differentiation. In this context, we studied the origin and distribution history of the allotetraploid CD rice genome. It is endemic to the Neotropics but the genus is thought to have originated in the Paleotropics, and there is relatively little genetic divergence between some orthologous sequences of the C genome component and their Old World counterparts.

Methodology/Principal Findings

Because of its allotetraploidy, there are several potential pitfalls in trying to date the formation of the CD genome using molecular data and this could lead to erroneous estimates. Therefore, we rather chose to rely on historical evidence to determine whether or not the CD genome was present in the Neotropics before the arrival of Columbus. We searched early collections of herbarium specimens and studied the reports of explorers of the tropical Americas for references to rice. In spite of numerous collectors traveling inland and collecting Oryza, plants determined as CD genome species were not observed away from cultivated rice fields until 1869. Various arguments suggest that they only consisted of weedy forms until that time.

Conclusions/Significance

The spatio-temporal distribution of herbarium collections fits a simple biogeographical scenario for the emergence in cultivated rice fields followed by radiation in the wild of the CD genome in the Neotropics during the last four centuries. This probably occurred from species introduced to the Americas by humans and we found no evidence that the CD genome pre-existed in the Old World. We therefore propose a new evolutionary hypothesis for such a recent origin of the CD genome. Moreover, we exemplify how an historical approach can provide potentially important information and help to disentangle the timing of evolutionary events in the history of the Oryza genomes.  相似文献   

8.
Using our previous result that the C--G distribution in genomes is very broad, varying as a power law of the size of the block of genome considered, we examine the C--G distribution in genes themselves. We show that the widths of the C--G distributions for the genes of several simple organisms also vary as power laws. This suggests that the power law behavior gives a universal scaling whereby the distributions for the C--G content of the genes from all species are mapped onto a single function.  相似文献   

9.
Characterization of reptilian genomes is essential for understanding the overall diversity and evolution of amniote genomes, because reptiles, which include birds, constitute a major fraction of the amniote evolutionary tree. To better understand the evolution and diversity of genomic characteristics in Reptilia, we conducted comparative analyses of online sequence data from Alligator mississippiensis (alligator) and Sphenodon punctatus (tuatara) as well as genome size and karyological data from a wide range of reptilian species. At the whole-genome and chromosomal tiers of organization, we find that reptilian genome size distribution is consistent with a model of continuous gradual evolution while genomic compartmentalization, as manifested in the number of microchromosomes and macrochromosomes, appears to have undergone early rapid change. At the sequence level, the third genomic tier, we find that exon size in Alligator is distributed in a pattern matching that of exons in Gallus (chicken), especially in the 101-200 bp size class. A small spike in the fraction of exons in the 301 bp-1 kb size class is also observed for Alligator, but more so for Sphenodon. For introns, we find that members of Reptilia have a larger fraction of introns within the 101 bp-2 kb size class and a lower fraction of introns within the 5-30 kb size class than do mammals. These findings suggest that the mode of reptilian genome evolution varies across three hierarchical levels of the genome, a pattern consistent with a mosaic model of genomic evolution.  相似文献   

10.
Global surveys of genomes measure the usage of essential molecular parts, defined here as protein families, superfamilies or folds, in different organisms. Based on surveys of the first 20 completely sequenced genomes, we observe that the occurrence of these parts follows a power-law distribution. That is, the number of distinct parts (F) with a given genomic occurrence (V) decays as F=aV(-b), with a few parts occurring many times and most occurring infrequently. For a given organism, the distributions of families, superfamilies and folds are nearly identical, and this is reflected in the size of the decay exponent b. Moreover, the exponent varies between different organisms, with those of smaller genomes displaying a steeper decay (i.e. larger b). Clearly, the power law indicates a preference to duplicate genes that encode for molecular parts which are already common. Here, we present a minimal, but biologically meaningful model that accurately describes the observed power law. Although the model performs equally well for all three protein classes, we focus on the occurrence of folds in preference to families and superfamilies. This is because folds are comparatively insensitive to the effects of point mutations that can cause a family member to diverge beyond detectable similarity. In the model, genomes evolve through two basic operations: (i) duplication of existing genes; (ii) net flow of new genes. The flow term is closely related to the exponent b and can accommodate considerable gene loss; however, we demonstrate that the observed data is reproduced best with a net inflow, i.e. with more gene gain than loss. Moreover, we show that prokaryotes have much higher rates of gene acquisition than eukaryotes, probably reflecting lateral transfer. A further natural outcome from our model is an estimation of the fold composition of the initial genome, which potentially relates to the common ancestor for modern organisms. Supplementary material pertaining to this work is available from www.partslist.org/powerlaw.  相似文献   

11.

Background

The current versions of reference genome assemblies still contain gaps represented by stretches of Ns. Since high throughput sequencing reads cannot be mapped to those gap regions, the regions are depleted of experimental data. Moreover, several technology platforms assay a targeted portion of the genomic sequence, meaning that regions from the unassayed portion of the genomic sequence cannot be detected in those experiments. We here refer to all such regions as inaccessible regions, and hypothesize that ignoring these regions in the null model may increase false findings in statistical testing of colocalization of genomic features.

Results

Our explorative analyses confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps of human reference genomes (hg19 and hg38). The little intersection was observed only at the beginning and end portions of the gap regions. Further, we simulated a set of synthetic tracks by matching the properties of real genomic tracks in a way that nullified any true association between them. This allowed us to test our hypothesis that not avoiding inaccessible regions (as represented by assembly gaps) in the null model would result in spurious inflation of statistical significance. We contrasted the distributions of test statistics and p-values of Monte Carlo-based permutation tests that either avoided or did not avoid assembly gaps in the null model when testing colocalization between a pair of tracks. We observed that the statistical tests that did not account for assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribution of p-values that is shifted to the left (indicating inflated significance). We observed a similar level of inflated significance in hg19 and hg38, despite assembly gaps covering a smaller proportion of the latter reference genome.

Conclusion

We provide empirical evidence demonstrating that inaccessible regions, even when covering only a few percentages of the genome, can lead to a substantial amount of false findings if not accounted for in statistical colocalization analysis.
  相似文献   

12.
Jabbari K  Bernardi G 《Gene》2000,247(1-2):287-292
In the present work we show that in the Drosophila genome (which covers a 37-51% GC range at a DNA size of approx.50kb) a linear correlation holds between GC (or GC(3)50kb) genomic sequences embedding them. This correlation allows us to position the two compositional distributions of (a) coding sequences, and (b) of long DNA segments relative to each other and to calculate gene concentration across the compositional range of the Drosophila genome. Using this approach, we show that gene concentration increases with increasing GC of the regions embedding the genes, reaching a 7-fold higher level in the GC-richest regions compared with the GC-poorest regions. The gene distribution of the Drosophila genome is, therefore, similar to (although less striking than) that of the human genome, whereas it is very different from those of the Arabidopsis genome, which has about the same size as the Drosophila genome.  相似文献   

13.
We introduce and analyse a simple probabilistic model of genome evolution. It is based on three fundamental evolutionary events: gene loss, duplication and accumulated change. This is motivated by previous works which consisted in fitting the available genomic data into, what is called paralog distributions. This formalism is described by a system of infinite number of linear equations. We show that this system generates a semigroup of linear operators on the space l 1. We prove that size distribution of paralogous gene families in a genome converges to the equilibrium as time goes to infinity. Moreover we show that when probabilities of gene removal and duplication are close to each other, then the resulting distribution is close to logarithmic distribution. Some empirical results for yeast genomes are presented.  相似文献   

14.
Feast and famine in plant genomes   总被引:25,自引:0,他引:25  
Plant genomes vary over several orders of magnitude in size, even among closely related species, yet the origin, genesis and significance of this variation are not clear. Because DNA content varies over a sevenfold range among diploid species in the cotton genus (Gossypium) and its allies, this group offers opportunities for exploring patterns and mechanisms of genome size evolution. For example, the question has been raised whether plant genomes have a one-way ticket to genomic obesity, as a consequence of retroelement accumulation. Few empirical studies directly address this possibility, although it is consistent with recent insights gleaned from evolutionary genomic investigations. We used a phylogenetic approach to evaluate the directionality of genome size evolution among Gossypium species and their relatives in the cotton tribe (Gossypieae, Malvaceae). Our results suggest that both DNA content increase and decrease have occurred repeatedly during evolution. In contrast to a model of unidirectional genome size change, the frequency of inferred genome size contraction exceeded that of expansion. In conjunction with other evidence, this finding highlights the dynamic nature of plant genome size evolution, and suggests that poorly understood genomic contraction mechanisms operate on a more extensive scale that previously recognized. Moreover, the research sets the stage for fine-scale analysis of the evolutionary dynamics and directionality of change for the full spectrum of genomic constituents.  相似文献   

15.
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood.  相似文献   

16.
Ancient demographic events can be inferred from the distribution of pairwise sequence differences (or mismatches) among individuals. We analyzed a database of 3,677 Y chromosomes typed for 11 biallelic markers in 48 human populations from Europe and the Mediterranean area. Contrary to what is observed in the analysis of mitochondrial polymorphisms, Tajima's test was insignificant for most Y-chromosome samples, and in 47 populations the mismatch distributions had multiple peaks. Taken at face value, these results would suggest either (1) that the size of the male population stayed essentially constant over time, while the female population size increased, or (2) that different selective regimes have shaped mitochondrial and Y-chromosome diversity, leading to an excess of rare alleles only in the mitochondrial genome. An alternative explanation would be that the 11 variable sites of the Y chromosome do not provide sufficient statistical power, so a comparison with mitochondrial data (where more than 200 variable sites are studied in Europe) is impossible at present. To discriminate between these possibilities, we repeatedly analyzed a European mitochondrial database, each time considering only 11 variable sites, and we estimated mismatch distributions in stable and growing populations, generated by simulating coalescent processes. Along with theoretical considerations, these tests suggest that the difference between the mismatch distributions inferred from mitochondrial and Y-chromosome data are not a statistical artifact. Therefore, the observed mismatch distributions appear to reflect different underlying demographic histories and/or selective pressures for maternally and paternally transmitted loci.  相似文献   

17.
Stegen JC  White EP 《Ecology letters》2008,11(12):1287-1293
It has been suggested that frequency distributions of individual tree masses in natural stands are characterized by power-law distributions with exponents near -3/4, and that therefore tree communities exhibit energetic equivalence among size classes. Because the mass of trees is not measured directly, but estimated from diameter, this supposition is based on the fact that the observed distribution of tree diameters is approximately characterized by a power-law with an exponent approximately -2. Here we show that diameter distributions of this form are not equivalent to mass distributions with exponents of -3/4, but actually to mass distributions with exponents of -11/8. We discuss the implications of this result for the metabolic theory of ecology and for understanding energetic equivalence and the processes structuring tree communities.  相似文献   

18.
Using the complete genome of Thermoplasma volcanium, as an example, we have examined the distribution functions for the amount of C or G in consecutive, non-overlapping blocks of m bases in this system. We find that these distributions are very much broader (by many factors) than those expected for a random distribution of bases. If we plot the widths of the C-G distributions relative to the widths expected for random distributions, as a function of the block size used, we obtain a power law with a characteristic exponent. The broadening of the C-G distributions follows from the empirical finding that blocks containing a given C-G content tend to be followed by blocks of similar C-G content thus indicating a statistical persistence of composition. The exponent associated with the power law thus measures the strength of persistence in a given DNA. This behavior can be understood using Mandelbrot's model of a fractional Brownian walk. In this model there is a hierarchy of persistence (correlation between blocks) between all parts of the system. The model gives us a way to scale the C-G distributions such that all these functions are collapsed onto a master curve. For a fractional Brownian walk, the fractal dimension of the C-G distribution is simply related to the persistence exponent for the power law. The persistence exponent for T. volcanium is found to be gamma = 0.29 while for a 10 million base segment of the human genome we obtain gamma = 0.39, similar to but not identical with the value found for the microbe.  相似文献   

19.
Several empirical studies have shown that the animal group size distribution of many species can be well fit by power laws with exponential truncation. A striking empirical result due to Niwa is that the exponent in these power laws is one and the truncation is determined by the average group size experienced by an individual. This distribution is known as the logarithmic distribution. In this paper we provide first principles derivation of the logarithmic distribution and other truncated power laws using a site-based merge and split framework. In particular, we investigate two such models. Firstly, we look at a model in which groups merge whenever they meet but split with a constant probability per time step. This generates a distribution similar, but not identical to the logarithmic distribution. Secondly, we propose a model, based on preferential attachment, that produces the logarithmic distribution exactly. Our derivation helps explain why logarithmic distributions are so widely observed in nature. The derivation also allows us to link splitting and joining behavior to the exponent and truncation parameters in power laws.  相似文献   

20.
During the adaptation of an organism to a parasitic lifestyle, various gene functions may be rendered superfluous due to the fact that the host may supply these needs. As a consequence, obligate symbiotic bacterial pathogens tend to undergo reductive genomic evolution through gene death (nonfunctionalization or pseudogenization) and deletion. Here, we examine the evolutionary sequence of gene-death events during the process of genome miniaturization in three bacterial species that have experienced extensive genome reduction: Mycobacterium leprae, Shigella flexneri, and Salmonella typhi. We infer that in all three lineages, the distribution of functional categories is similar in pseudogenes and genes but different from that of absent genes. Based on an analysis of evolutionary distances, we propose a two-step "domino effect" model for reductive genome evolution. The process starts with a gradual gene-by-gene-death sequence of events. Eventually, a crucial gene within a complex pathway or network is rendered nonfunctional triggering a "mass gene extinction" of the dependent genes. In contrast to published reports according to which genes belonging to certain functional categories are prone to nonfunctionalization more frequently and earlier than genes belonging to other functional categories, we could discern no characteristic regularity in the temporal order of function loss.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号