共查询到20条相似文献,搜索用时 31 毫秒
1.
Philipp W Messer Ralf Bundschuh Martin Vingron Peter F Arndt 《Journal of computational biology》2007,14(5):655-668
Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context. 相似文献
2.
Swati D 《Journal of biosciences》2007,32(6):1169-1184
Fast-sequencing throughput methods have increased the number of completely sequenced bacterial genomes to about 400 by December 2006, with the number increasing rapidly. These include several strains. In silico methods of comparative genomics are of use in categorizing and phylogenetically sorting these bacteria. Various word-based tools have been used for quantifying the similarities and differences between entire genomes. The simple di-nucleotide frequency comparison, codon specificity and k-mer repeat detection are among some of the well-known methods. In this paper, we show that the Mutual Information function, which is a measure of correlations and a concept from Information Theory, is very effective in determining the similarities and differences among genome sequences of various strains of bacteria such as the plant pathogen Xylella fastidiosa, marine Cyanobacteria Prochlorococcus marinus or animal and human pathogens such as species of Ehrlichia and Legionella. The short-range three-base periodicity, small sequence repeats and long-range correlations taken together constitute a genome signature that can be used as a technique for identifying new bacterial strains with the help of strains already catalogued in the database. There have been several applications of using the Mutual Information function as a measure of correlations in genomics but this is the first whole genome analysis done to detect strain similarities and differences. 相似文献
3.
Evolutionary branching has been suggested as a mechanism to explain ecological speciation processes. Recent studies indicate however that demographic stochasticity and environmental fluctuations may prevent branching through stochastic competitive exclusion. Here we extend previous theory in several ways; we use a more mechanistic ecological model, we incorporate environmental fluctuations in a more realistic way and we include environmental autocorrelation in the analysis. We present a single, comprehensible analytical result which summarizes most effects of environmental fluctuations on evolutionary branching driven by resource competition. Corroborating earlier findings, we show that branching may be delayed or impeded if the underlying resources have uncorrelated or negatively correlated responses to environmental fluctuations. There is also a strong impeding effect of positive environmental autocorrelation, which can be related to results from recent experiments on adaptive radiation in bacterial microcosms. In addition, we find that environmental fluctuations can lead to cycles of repeated branching and extinction. 相似文献
4.
D. Swati 《Journal of biosciences》2007,32(2):1169-1184
Fast-sequencing throughput methods have increased the number of completely sequenced bacterial genomes to about 400 by December 2006, with the number increasing rapidly. These include several strains. In silico methods of comparative genomics are of use in categorizing and phylogenetically sorting these bacteria. Various word-based tools have been used for quantifying the similarities and differences between entire genomes. The simple di-nucleotide frequency comparison, codon specificity and k-mer repeat detection are among some of the well-known methods.In this paper, we show that the Mutual Information function, which is a measure of correlations and a concept from Information Theory, is very effective in determining the similarities and differences among genome sequences of various strains of bacteria such as the plant pathogen Xylella fastidiosa, marine Cyanobacteria Prochlorococcus marinus or animal and human pathogens such as species of Ehrlichia and Legionella. The short-range three-base periodicity, small sequence repeats and long-range correlations taken together constitute a genome signature that can be used as a technique for identifying new bacterial strains with the help of strains already catalogued in the database.There have been several applications of using the Mutual Information function as a measure of correlations in genomics but this is the first whole genome analysis done to detect strain similarities and differences. 相似文献
5.
Benjamin Audit Cédric Vaillant Alain Arnéodo Yves d'Aubenton-Carafa Claude Thermes 《Journal of biological physics》2004,30(1):33-81
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime ( 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2]. 相似文献
6.
7.
We have studied the presence of long-range correlations in the complete genomes of ten different dsDNA viruses and Saccharomyces cerevisiae (bakers' yeast) chromosome I. We have also studied the correlation between the distribution of the gene length and the domain of "1/f region" of their genomes. Linear regression analysis was done for the power-law region of these organisms and the slope values obtained were approximately -1, which signify the existence of "1/f noise" in the low and medium (intermediate) frequency regions. This suggests the presence of long-range correlations in their genomes. The presence of 1/f noise in a given frequency interval indicates the existence of a fractal (self-similar) structure in the corresponding range of wavelengths. The results of our study suggest that genes have correlations within themselves, and the correlations appear to be related with the scaling exponent alpha. 相似文献
8.
Audit B Vaillant C Arneodo A d'Aubenton-Carafa Y Thermes C 《Journal of molecular biology》2002,316(4):903-918
It has been established that the precise positioning of nucleosomes on genomic DNA can be achieved, at least for a minority of them, through sequence-dependent processes. However, to what extent DNA sequences play a role in the positioning of the major part of nucleosomes is still debated. The aim of the present study is to examine to what extent long-range correlations (LRC) are related to the presence of nucleosomes. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. The exploration of a number of eukaryotic and bacterial genomes through the optics of the so-called "wavelet transform microscope" reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. Here, we focus on the existence of LRC in the small-scale regime (10-200 bp) which are actually observed in eukaryotic genomes, in contrast to their absence in eubacterial genomes. Analysis of viral DNA genomes shows that, like their host's genomes, eukaryotic viruses present LRC but eubacterial viruses do not. There is one exception for genomes of poxviruses (Vaccinia and Melamoplus sanguinipes) which do not replicate in the cell nucleus and do not exhibit LRC. No small-scale LRC are detected in the genomes of all examined RNA viruses, with the exception of retroviruses. These results together with the observation of LRC between particular sequence motifs known to participate in the formation of nucleosomes (e.g. AA dinucleotides) strongly suggest that the 10-200 bp LRC are a signature of the sequence-dependence of nucleosome positioning. Finally, we discuss possible interpretations of these LRC in terms of the physical mechanisms that might govern the positioning and the dynamics of the nucleosomes along the DNA chain through cooperative processes. 相似文献
9.
The genome scale threading of five complete microbial genomes is revisited using our state-of-the-art threading algorithm, PROSPECTOR_Q. Considering that structure assignment to an ORF could be useful for predicting biochemical function as well as for analyzing pathways, it is important to assess the current status of genome scale threading. The fraction of ORFs to which we could assign protein structures with a reasonably good confidence level to each genome sequences is over 72%, which is significantly higher than earlier studies. Using the assigned structures, we have predicted the function of several ORFs through "single-function" template structures, obtained from an analysis of the relationship between protein fold and function. The fold distribution of the genomes and the effect of the number of homologous sequences on structure assignment are also discussed. 相似文献
10.
Tobias A. Knoch Markus Göker Rudolf Lohner Anis Abuseiris Frank G. Grosveld 《European biophysics journal : EBJ》2009,38(6):757-779
The sequential organization of genomes, i.e. the relations between distant base pairs and regions within sequences, and its
connection to the three-dimensional organization of genomes is still a largely unresolved problem. Long-range power-law correlations
were found using correlation analysis on almost the entire observable scale of 132 completely sequenced chromosomes of 0.5 × 106 to 3.0 × 107 bp from Archaea, Bacteria, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens. The local correlation coefficients show a species-specific multi-scaling behaviour: close to random correlations on the
scale of a few base pairs, a first maximum from 40 to 3,400 bp (for Arabidopsis thaliana and Drosophila melanogaster divided in two submaxima), and often a region of one or more second maxima from 105 to 3 × 105 bp. Within this multi-scaling behaviour, an additional fine-structure is present and attributable to codon usage in all except
the human sequences, where it is related to nucleosomal binding. Computer-generated random sequences assuming a block organization
of genomes, the codon usage, and nucleosomal binding explain these results. Mutation by sequence reshuffling destroyed all
correlations. Thus, the stability of correlations seems to be evolutionarily tightly controlled and connected to the spatial
genome organization, especially on large scales. In summary, genomes show a complex sequential organization related closely
to their three-dimensional organization.
This article has been submitted as a contribution to the festschrift entitled “Uncovering cellular sub-structures by light
microscopy” in honor of Professor Cremer’s 65th birthday. 相似文献
11.
The detection and quantification of long-range correlations in time series is a fundamental tool to characterize the properties of different dynamical systems, and is applied in many different fields, including physics, biology or engineering. Due to the diversity of applications, many techniques for measuring correlations have been designed. Here, we study systematically the influence of the length of a time series on the results obtained from several techniques commonly used to detect and quantify long-range correlations: the autocorrelation analysis, Hursts analysis, and detrended fluctuation analysis (DFA). Using the Fourier filtering method, we generate artificial time series with known and controlled long-range correlations and with a broad range of lengths, and apply on them the different correlation measures we have studied. Our results indicate that while the DFA method is practically unaffected by the length of the time series, and almost always provides accurate results, the results from Hursts analysis and the autocorrelation analysis strongly depend on the length of the time series. 相似文献
12.
All amino acid sequences derived from 248 prokaryotic genomes, 10 invertebrate genomes (plants and fungi) and 10 vertebrate genomes were analysed by the autocorrelation function of charge sequences. The analysis of the total amino acid sequences derived from the 268 biological genomes showed that a significant periodicity of 28 residues is observable for the vertebrate genomes, but not for the other genomes. When proteins with a charge periodicity of 28 residues (PCP28) were selected from the total proteomes, we found that PCP28 in fact exists in all proteomes, but the number of PCP28 is much larger for the vertebrate proteomes than for the other proteomes. Although excess PCP28 in the vertebrate proteomes are only poorly characterized, a detailed inspection of the databases suggests that most excess PCP28 are nuclear proteins. 相似文献
13.
Immunogenicity arises via many synergistic mechanisms, yet the overall dissimilarity of pathogenic proteins versus the host proteome has been proposed as a key arbiter. We have previously explored this concept in relation to Bacterial antigens; here we extend our analysis to antigens of viral and fungal origin. Sets of known viral and fungal antigenic and non-antigenic protein sequences were compared to human and mouse proteomes. Both antigenic and non-antigenic sequences lacked human or mouse homologues. Observed distributions were compared using the non-parametric Mann-Whitney test. The statistical null hypothesis was accepted, indicating that antigen and non-antigens did not differ significantly. Likewise, we could not determine a threshold able meaningfully to separate non-antigen from antigen. We conclude that viral and fungal antigens cannot be predicted from pathogen genomes based solely on their dissimilarity to mammalian genomes. 相似文献
14.
The genome of human immunodeficiency virus (HIV) has an average nucleotide composition strongly biased as compared to the human genome. The consequence of such nucleotide composition on HIV pathogenicity has not been investigated yet. To address this question, we analyzed the role of nucleotide bias of HIV-derived nucleic acids in stimulating type-I interferon response in vitro. We found that the biased nucleotide composition of HIV is detected in human cells as compared to humanized sequences, and triggers a strong innate immune response, suggesting the existence of cellular immune mechanisms able to discriminate RNA sequences according to their nucleotide composition or to detect specific secondary structures or linear motifs within biased RNA sequences. We then extended our analysis to the entire genome scale by testing more than 1300 HIV-1 complete genomes to look for an association between nucleotide composition of HIV-1 group M subtypes and their pathogenicity. We found that subtype D, which has an increased pathogenicity compared to the other subtypes, has the most divergent nucleotide composition relative to the human genome. These data support the hypothesis that the biased nucleotide composition of HIV-1 may be related to its pathogenicity. 相似文献
15.
2019年12月,中国武汉报道了冠状病毒引起的肺炎,其临床症状与2003年爆发的严重急性呼吸综合征(Severe Acute Respiratory Syndrome, SARS)不同,因此推断该病毒可能是冠状病毒的一个新变种。不同于简单使用全基因组序列的其它研究,我们于2018年在国际上首次提出分子功能与进化分析相结合的研究思想,并应用于Beta冠状病毒B亚群(BB冠状病毒)基因组的研究。在这一思想指导下,本研究使用BB冠状病毒基因组中的一个互补回文序列(命名为Nankai complemented palindrome)与其所在的编码区(命名为Nankai CDS)对新发布的2019新型冠状病毒基因组(GenBank:MN908947)进行分析以期准确溯源,并对BB冠状病毒的跨物种传播和宿主适应性进行初步研究。溯源分析的结果支持2019新型冠状病毒源自蝙蝠,但与SARS冠状病毒差异巨大,这一结果与两者临床症状差异一致。本研究的最重要发现是BB冠状病毒存在大量的可变翻译,从分子水平揭示了BB冠状病毒变异快、多样性高的特点。从BB冠状病毒可变翻译中获取的信息可应用于(但不限于)其快速检测、基因分型、疫苗开发以及药物设计。另外,我们推断BB冠状病毒可能通过可变翻译以适应不同宿主。基于大量基因组数据的实证分析,本研究在国际上首次从分子水平尝试解释了BB冠状病毒变异快、宿主多且具有较强的宿主适应性的原因。 相似文献
16.
Mahta Rasouli Golta Rasouli Fredrick A. Lenz Donald S. Borrett Leo Verhagen Hon C. Kwan 《Journal of biological physics》2010,36(2):197-205
Many studies have demonstrated the presence of scale invariance and long-range correlation in animal and human neuronal spike trains. The methodologies to extract the fractal or scale-invariant properties, however, do not address the issue as to the existence within the train of fine temporal structures embedded in the global fractal organisation. The present study addresses this question in human spike trains by the chaos game representation (CGR) approach, a graphical analysis with which specific temporal sequences reveal themselves as geometric structures in the graphical representation. The neuronal spike train data were obtained from patients whilst undergoing pallidotomy. Using this approach, we observed highly structured regions in the representation, indicating the presence of specific preferred sequences of interspike intervals within the train. Furthermore, we observed that for a given spike train, the higher the magnitude of its scaling exponent, the more pronounced the geometric patterns in the representation and, hence, higher probability of occurrence of specific subsequences. Given its ability to detect and specify in detail the preferred sequences of interspike intervals, we believe that CGR is a useful adjunct to the existing set of methodologies for spike train analysis. 相似文献
17.
Fractal landscapes and molecular evolution: modeling the myosin heavy chain gene family. 总被引:1,自引:1,他引:0
下载免费PDF全文
![点击此处可从《Biophysical journal》网站下载免费的PDF全文](/ch/ext_images/free.gif)
S V Buldyrev A L Goldberger S Havlin C K Peng H E Stanley M H Stanley M Simons 《Biophysical journal》1993,65(6):2673-2679
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution. 相似文献
18.
Many years ago compositional correlations were found to hold between coding and contiguous non-coding sequences. These correlations were essentially studied in whole genomes of mammals, which are characterized by strong compositional heterogeneities. Here we investigated whether these correlations also hold within the much more homogeneous isochore families. This point was checked not only in the case of mammals, but also in that of phylogenetically distant vertebrates, which are characterized by very different compositional patterns. Indeed, these are remarkably different in cold- and warm-blooded vertebrates. Fish genomes, for instance, are much more homogeneous than those of mammals and birds. The compositional correlations between coding sequences and the corresponding introns, or their 5′ and 3′ flanking regions, were studied in the isochore families of the fully sequenced genomes from four fishes (Brachydanio rerio, Oryzias latipes, Gasterosteus aculeatus and Tetraodon nigroviridis), human and chicken. 相似文献
19.