首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.  相似文献   

2.
The nic2 mutation of Coprinus radiatus is unstable at meiosis. Strains derived from the initial mutant can be divided intto two classes: the "neutral genomes" which all revert at meiosis with different frequencies but in an autonomous way; and the "aggressive genomes", which inhibit the reversion of neutral genomes, present a large polymorphism in their own reversion and lose their aggressiveness at meiosis, becoming neutral genomes. The characteristics and the relationships of these two strains are presented in this paper.  相似文献   

3.
Equal Symbol Fourier Transforms (FTES), characterizing nucleotide periodicity, comprise components of 5-D vectors that define base-repeat properties of a genomic sequence. This report describes a conversion of the FTES signals to a common platform of Shannon information content to facilitate comparisons of periodic data with other measures of information for genes and genomes. The autocorrelation used to compute the discrete FTES formed the basis to define repeating bases in terms of conditional probabilities. We derived a vector equation to express the Shannon information content of a sequence in a way that preserves the distinct specificity of base repeat patterns characterized by FTES vectors. We suggest application of such information vectors to study the structure of information in genes, chromosomes, and genomes by chi(2) comparisons.  相似文献   

4.
The complete base sequence of HIV-1 virus and GP120 ENV gene were analyzed to establish their distance to the expected neutral random sequence. An especial methodology was devised to achieve this aim. Analyses included: a) proportion of dinucleotides (signatures); b) homogeneity in the distribution of dinucleotides and bases (isochores) by dividing both segments in ten and three sub-segments, respectively; c) probability of runs of bases and No-bases according to the Bose-Einstein distribution. The analyses showed a huge deviation from the random distribution expected from neutral evolution and neutral-neighbor influence of nucleotide sites. The most significant result is the tremendous lack of CG dinucleotides (p < 10(-50) ), a selective trait of eukaryote and not of single stranded RNA virus genomes. Results not only refute neutral evolution and neutral neighbor influence, but also strongly indicate that any base at any nucleotide site correlates with all the viral genome or sub-segments. These results suggest that evolution of HIV-1 is pan-selective rather than neutral or nearly neutral.  相似文献   

5.
Extranuclear differentiation and gene flow in the finite island model   总被引:15,自引:8,他引:7       下载免费PDF全文
Takahata N  Palumbi SR 《Genetics》1985,109(2):441-457
Use of sequence information from extranuclear genomes to examine deme structure in natural populations has been hampered by lack of clear linkage between sequence relatedness and rates of mutation and migration among demes. Here, we approach this problem in two complementary ways. First, we develop a model of extranuclear genomes in a population divided into a finite number of demes. Sex-dependent migration, neutral mutation, unequal genetic contribution of separate sexes and random genetic drift in each deme are incorporated for generality. From this model, we derive the relationship between gene identity probabilities (between and within demes) and migration rate, mutation rate and effective deme size. Second, we show how within- and between-deme identity probabilities may be calculated from restriction maps of mitochondrial (mt) DNA. These results, when coupled with our results on gene flow and genetic differentiation, allow estimation of relative interdeme gene flow when deme sizes are constant and genetic variants are selectively neutral. We illustrate use of our results by reanalyzing published data on mtDNA in mouse populations from around the world and show that their geographic differentiation is consistent with an island model of deme structure.  相似文献   

6.
Comparison of closely related bacterial genomes has revealed the presence of highly conserved sequences forming a "backbone" that is interrupted by numerous, less conserved, DNA fragments. Segmentation of bacterial genomes into backbone and variable regions is particularly useful to investigate, among other things, bacterial genome evolution. Several software tools have been designed to compare complete bacterial chromosomes and a few online databases store pre-computed genome comparisons. However, very few statistical methods are available to evaluate the reliability of these software tools and to compare the results obtained with them. To fill this gap, we have developed two local scores to measure the robustness of bacterial genome segmentations. Our method uses a simulation procedure based on random perturbations of the compared genomes. The two scores described in this article provide useful information and are easy to implement, and their interpretation is intuitive. We show that they are suited to discriminate between robust and non-robust segmentations when genome aligners such as MAUVE and MGA are used.  相似文献   

7.
Shannon’s seminal approach to estimating information capacity is widely used to quantify information processing by biological systems. However, the Shannon information theory, which is based on power spectrum estimation, necessarily contains two sources of error: time delay bias error and random error. These errors are particularly important for systems with relatively large time delay values and for responses of limited duration, as is often the case in experimental work. The window function type and size chosen, as well as the values of inherent delays cause changes in both the delay bias and random errors, with possibly strong effect on the estimates of system properties. Here, we investigated the properties of these errors using white-noise simulations and analysis of experimental photoreceptor responses to naturalistic and white-noise light contrasts. Photoreceptors were used from several insect species, each characterized by different visual performance, behavior, and ecology. We show that the effect of random error on the spectral estimates of photoreceptor performance (gain, coherence, signal-to-noise ratio, Shannon information rate) is opposite to that of the time delay bias error: the former overestimates information rate, while the latter underestimates it. We propose a new algorithm for reducing the impact of time delay bias error and random error, based on discovering, and then using that size of window, at which the absolute values of these errors are equal and opposite, thus cancelling each other, allowing minimally biased measurement of neural coding.  相似文献   

8.
Shannon information is commonly assumed to be the wrong way in which to conceive of information in most biological contexts. Since the theory deals only in correlations between systems, the argument goes, it can apply to any and all causal interactions that affect a biological outcome. Since informational language is generally confined to only certain kinds of biological process, such as gene expression and hormone signalling, Shannon information is thought to be unable to account for this restriction. It is often concluded that a richer, teleosemantic sense of information is needed. I argue against this view, and show that a coherent and sufficiently restrictive theory of biological information can be constructed with Shannon information at its core. This can be done by paying due attention some crucial distinctions: between information quantity and its fitness value, and between carrying information and having the function of doing so. From this I construct an account of how informational functions arise, and show that the “subject matter” of these functions can easily be seen as the natural information dealt with by Shannon’s theory.  相似文献   

9.
Shannon entropy H and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information (“Shannon differentiation”) between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (Sturnus vulgaris) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.  相似文献   

10.
We compare the annotation of three complete genomes using theab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.  相似文献   

11.
The recent electronmicroscopic and biochemical mapping of Z-DNA sites in phi X174, SV40, pBR322 and PM2 DNAs has been used to determine two sets of criteria for identification of potential Z-DNA sequences in natural DNA genomes. The prediction of potential Z-DNA tracts and corresponding statistical analysis of their occurrence have been made on a sample of 14 DNA genomes. Alternating purine and pyrimidine tracts longer than 5 base pairs in length and their clusters (quasi alternating fragments) in the 14 genomes studied are under-represented compared to the expectation from corresponding random sequences. The fragments [d(G X C)]n and [d(C X G)]n (n greater than or equal to 3) in general do not occur in circular DNA genomes and are under-represented in the linear DNAs of phages lambda and T7, whereas in linear genomes of adenoviruses they are strongly over-represented. With minor exceptions, potential Z-DNA sites are also under-represented compared to random sequences. In the 14 genomes studied, predicted Z-DNA tracts occur in non-coding as well as in protein coding regions. The predicted Z-DNA sites in phi X174, SV40, pBR322 and PM2 correspond well with those mapped experimentally. A complete listing together with a compact graphical representation of alternating purine-pyrimidine fragments and their Z-forming potential are presented.  相似文献   

12.
Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle‐scale barcodes. Next‐generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high‐quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long‐range PCR and sequenced using next‐generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early‐diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome‐scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms.  相似文献   

13.
Lateral and oblique gene transfer   总被引:13,自引:0,他引:13  
Sequence information from complete genomes, and from multiple loci of strains within species, is transforming the way that we investigate the evolution of bacteria. Such large-scale assessments of bacterial genomes have provided evidence of extensive gene transfer and exchange. Except in rare cases, these two processes do not seem to be coupled: certain species, such as Escherichia coli, undergo relatively low levels of gene exchange; but the emergence of pathogenic strains is associated with the acquisition of numerous virulence factors by lateral gene transfer.  相似文献   

14.
Picornaviruses are small animal RNA viruses and include wtiological agents of poliomyelitis, foot and mouse disease, hepatitis A, etc. Replication of their genome results in many mutations, which are close in number to a viability threshold. Hence every virus population contains a great variety of genomes and represents a quasispecies. Covalent rearrangements (deletions, insertions, recombination) also contribute to genome variation and arise by replicative and nonreplicative mechanisms, which are still poorly understood. Only a minor fraction of all new changes is fixed during evolution. The fixation is based on two principally different ways of selection: with (positive and negative selection) and without (random selection of nonrepresentative variants) regard to the phenotype. In natural evolution of picornaviruses, the latter way is prevalent, and most fixed mutations are phenotypically neutral. To understand the mechanisms of evolution, it is necessary to evaluate the biological significance of particular genetic changes. Several new approaches to this problem have recently been proposed.  相似文献   

15.
16.
Genome Instability in Picornaviruses   总被引:1,自引:0,他引:1  
Agol  V. I. 《Molecular Biology》2002,36(2):216-222
Picornaviruses are small animal RNA viruses and include etiological agents of poliomyelitis, foot and mouth disease, hepatitis A, etc. Replication of their genome results in many mutations, which are close in number to a viability threshold. Hence every virus population contains a great variety of genomes and represents a quasispecies. Covalent rearrangements (deletions, insertions, recombination) also contribute to genome variation and arise by replicative and nonreplicative mechanisms, which are still poorly understood. Only a minor fraction of all new changes is fixed during evolution. The fixation is based on two principally different ways of selection: with (positive and negative selection) and without (random selection of nonrepresentative variants) regard to the phenotype. In natural evolution of picornaviruses, the latter way is prevalent, and most fixed mutations are phenotypically neutral. To understand the mechanisms of evolution, it is necessary to evaluate the biological significance of particular genetic changes. Several new approaches to this problem have recently been proposed.  相似文献   

17.
18.
Biologists rely heavily on the language of information, coding, and transmission that is commonplace in the field of information theory developed by Claude Shannon, but there is open debate about whether such language is anything more than facile metaphor. Philosophers of biology have argued that when biologists talk about information in genes and in evolution, they are not talking about the sort of information that Shannon’s theory addresses. First, philosophers have suggested that Shannon’s theory is only useful for developing a shallow notion of correlation, the so-called “causal sense” of information. Second, they typically argue that in genetics and evolutionary biology, information language is used in a “semantic sense,” whereas semantics are deliberately omitted from Shannon’s theory. Neither critique is well-founded. Here we propose an alternative to the causal and semantic senses of information: a transmission sense of information, in which an object X conveys information if the function of X is to reduce, by virtue of its sequence properties, uncertainty on the part of an agent who observes X. The transmission sense not only captures much of what biologists intend when they talk about information in genes, but also brings Shannon’s theory back to the fore. By taking the viewpoint of a communications engineer and focusing on the decision problem of how information is to be packaged for transport, this approach resolves several problems that have plagued the information concept in biology, and highlights a number of important features of the way that information is encoded, stored, and transmitted as genetic sequence.  相似文献   

19.
Three metrics of species diversity – species richness, the Shannon index and the Simpson index – are still widely used in ecology, despite decades of valid critiques leveled against them. Developing a robust diversity metric has been challenging because, unlike many variables ecologists measure, the diversity of a community often cannot be estimated in an unbiased way based on a random sample from that community. Over the past decade, ecologists have begun to incorporate two important tools for estimating diversity: coverage and Hill diversity. Coverage is a method for equalizing samples that is, on theoretical grounds, preferable to other commonly used methods such as equal-effort sampling, or rarefying datasets to equal sample size. Hill diversity comprises a spectrum of diversity metrics and is based on three key insights. First, species richness and variants of the Shannon and Simpson indices are all special cases of one general equation. Second, richness, Shannon and Simpson can be expressed on the same scale and in units of species. Third, there is no way to eliminate the effect of relative abundance from estimates of any of these diversity metrics, including species richness. Rather, a researcher must choose the relative sensitivity of the metric towards rare and common species, a concept which we describe as ‘leverage.' In this paper we explain coverage and Hill diversity, provide guidelines for how to use them together to measure species diversity, and demonstrate their use with examples from our own data. We show why researchers will obtain more robust results when they estimate the Hill diversity of equal-coverage samples, rather than using other methods such as equal-effort sampling or traditional sample rarefaction.  相似文献   

20.
Little is known about how human cancers grow because direct observations are impractical. Cancers are clonal populations and the billions of cancer cells present in a visible tumor are progeny of a single transformed cell. Therefore, human cancers can be represented by somatic cell ancestral trees that start from a single transformed cell and end with billions of present day cancer cells. We use a genealogical approach to infer tumor growth from somatic trees, employing haplotype DNA methylation pattern variation, or differences between specific CpG sites or "tags," in the cancer genome. DNA methylation is an epigenetic mark that is copied, with error, during genome replication. At our tags, neutral copy errors in DNA methylation appear to occur at random, and much more frequently than sequence copy errors. To reconstruct a cancer tree, we sample and compare human colorectal genomes within small geographic regions (a cancer fragment), between fragments on the same side of the tumor, and between fragments from opposite tumor halves. The combined information on both physical distance and epigenetic distance informs our model for tumor ancestry. We use approximate Bayesian computation, a simulation-based method, to model tumor growth under a variety of evolutionary scenarios, estimating parameters that fit observed DNA methylation patterns. We conclude that methylation patterns sampled from human cancers are consistent with replication errors and certain simple cancer growth models. The inferred cancer trees are consistent with Gompertzian growth, a well-known cancer growth pattern.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号