首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
为了深入研究基因组序列的多重分形性质,首先选取12条较长的DNA序列,并根据此12条DNA序列的编码/非编码片段将DNA序列转换成相应的12条时间序列,其次对这12个时间序列进行多重分形Hurst分析,计算它们的Hurst指数,并且利用Hurst指数分析序列的自相似性,进一步将得到的Hurst指数与DNA一维游走模型相比较,发现12条序列均具有长程相关性,这说明DNA序列中确实存在着长程相关现象。  相似文献   

2.
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution.  相似文献   

3.
We investigated the differences in brain fMRI signal complexity in patients with schizophrenia while performing the Cyberball social exclusion task, using measures of Sample entropy and Hurst exponent (H). 13 patients meeting diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM IV) criteria for schizophrenia and 16 healthy controls underwent fMRI scanning at 1.5 T. The fMRI data of both groups of participants were pre-processed, the entropy characterized and the Hurst exponent extracted. Whole brain entropy and H maps of the groups were generated and analysed. The results after adjusting for age and sex differences together show that patients with schizophrenia exhibited higher complexity than healthy controls, at mean whole brain and regional levels. Also, both Sample entropy and Hurst exponent agree that patients with schizophrenia have more complex fMRI signals than healthy controls. These results suggest that schizophrenia is associated with more complex signal patterns when compared to healthy controls, supporting the increase in complexity hypothesis, where system complexity increases with age or disease, and also consistent with the notion that schizophrenia is characterised by a dysregulation of the nonlinear dynamics of underlying neuronal systems.  相似文献   

4.
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.  相似文献   

5.
The results obtained through biological research usually need to be analyzed using computational tools, since manual analysis becomes unfeasible due to the complexity and size of these results. For instance, the study of quasispecies frequently demands the analysis of several, very lengthy sequences of nucleotides and amino acids. Therefore, bioinformatics tools for the study of quasispecies are constantly being developed due to different problems found by biologists. In the present study, we address the development of a software tool for the evaluation of population diversity in quasispecies. Special attention is paid to the localization of genome regions prone to changes, as well as of possible hot spots.  相似文献   

6.
We present the analysis of a phase-shift sequence obtained from random transitions between periodic solutions of a biochemical dynamical model, formed by a system of three differential equations and which represent an instability-generating multienzymatic mechanism. The phase-shift series was studied in terms of Hurst’s rescaled range analysis. We found that the data were characterized by a Hurst exponent H = 0.69, which was clearly indicative of long-term trends. This result had a high significance level, as was confirmed through Monte Carlo simulations in which the data were scrambled in the series, destroying its original ordering. For these series we obtained a Hurst exponent which was consistent with the expectation of H = 0.5 for a random independent process. This clearly showed that, although the transitions between the periodic solutions were provoked randomly, the stochastic process obtained exhibited long-term persistence. The fractal dimension was also estimated and found to be consistent with the value of the Hurst exponent.  相似文献   

7.
Intuitively, the complexity of a given DNA sequence is related to the number of various superimposed biological messages it contains. Here we assess the expectation that in nucleosome DNA sequences of lower linguistic complexity, the nucleosome DNA positioning pattern would be more pronounced than in those of higher linguistic complexity. The nucleosome DNA positioning pattern is one of the weakest (highly degenerate) sequence patterns. It has been extracted recently by specially designed multiple alignment procedures. We applied the most sensitive of these procedures to nearly equal subsets of a nucleosome database separated according to linguistic complexity. The pattern extracted from the subset of the simpler nucleosome sequences not only possesses all major attributes of the known nucleosomal pattern, but is substantially stronger with respect to amplitude in comparison with the total database. This result constitutes the first demonstration that a weak pattern can be significantly enhanced by selective treatment of a lower complexity subset of the sequence ensemble under consideration.  相似文献   

8.
A previously formulated procedure for the quantitative evaluation of the complexities of molecules and biostructures is applied to assess the complexities of selected genomic DNA sequences. These include: (1) Several E. coli genes, including lacI, as examples of DNA sequences which are nearly as complex as possible (relative complexity=∼1). This is verified by the Lempel-Ziv (LZ) complexity analysis. (2) The telomere of a yeast chromosome, which has a considerable number of regular features that reduce complexity; the telomere shows indeed a lower structural complexity value. (3) A segment of human DNA, gene p53, which has a certain number of regular features such as 29 interspersed alu elements; these features cause a certain reduction in the complexity of the p53 gene, but do not invalidate the (previous) overall conclusion that template complexity is very high. The close to maximal complexity of the transcribed regions of p53 is validated by the LZ compression analysis. The general conclusion is that DNA base sequence composition is the dominant factor determining cellular complexity. The high complexity of DNA arrived at is a direct consequence of the template character of DNA and reflects the role of genomic DNA as a principal regulating element of a cell. It will be a challenge to find systems of lower complexity with the ability to respond to challenges from the environment to the extent that DNA templated systems do. Cellular complexity and template directed activity are thus highly intertwined properties, at the heart of many developmental, behavioral and evolutionary processes.  相似文献   

9.
Complexity charts can be used to map functional domains in DNA   总被引:4,自引:0,他引:4  
We measured local compositional complexity (LCC) of DNA sequences by calculating Shannon information content over mononucleotide frequencies. Eukaryotic DNA appeared to be "simpler" than bacterial DNA even at the level of short oligonucleotides. Moreover, different DNA functional domains displayed different compositional complexity in a systematic manner. In particular, the complexity of exon sequences was systematically higher than the complexity of corresponding introns. We therefore present examples of complexity charts (plots of complexity versus position in sequence) for pre-mRNA sequences from higher eukaryotes. By taking a window width of 100 nucleotides and a window step of 1 nucleotide, introns can be distinguished from exons in the majority of cases studied. Complexity charts of immunoglobulin variable regions allowed correct mapping of exons and introns in these sequences as well, a task that was impossible with commercial programs available to date.  相似文献   

10.
DNA gel-blot and in situ hybridization with genome-specific repeated sequences have proven to be valuable tools in analyzing genome structure and relationships in species with complex allopolyploid genomes such as hexaploid oat (Avena sativa L., 2n = 6x = 42; AACCDD genome). In this report, we describe a systematic approach for isolating genome-, chromosome-, and region-specific repeated and low-copy DNA sequences from oat that can presumably be applied to any complex genome species. Genome-specific DNA sequences were first identified in a random set of A. sativa genomic DNA cosmid clones by gel-blot hybridization using labeled genomic DNA from different Avena species. Because no repetitive sequences were identified that could distinguish between the A and D gneomes, sequences specific to these two genomes are refereed to as A/D genome specific. A/D or C genome specific DNA subfragments were used as screening probes to identify additional genome-specific cosmid clones in the A. sativa genomic library. We identified clustered and dispersed repetitive DNA elements for the A/D and C genomes that could be used as cytogenetic markers for discrimination of the various oat chromosomes. Some analyzed cosmids appeared to be composed entirely of genome-specific elements, whereas others represented regions with genome- and non-specific repeated sequences with interspersed low-copy DNA sequences. Thus, genome-specific hybridization analysis of restriction digests of random and selected A. sativa cosmids also provides insight into the sequence organization of the oat genome.  相似文献   

11.
Summary Besides the AT-specific fluorochromes, GC-specific fluorescent antibiotics are now available for chromosomal analysis. Chromosomal bands represent large accumulation of DNA sequences with similar AT:GC ratio. These uniform differences from the mean AT:GC ratio in the bands can be explained only by at least partial repetition of short DNA sequences in these regions. By comparison of various staining techniques more information also on the constitutive heterochromatin in man becomes available. The human NOR region exhibits a complex organization when studied by various basespecific fluorochromes and silver staining. The DNA-specific fluorochromes are also useful tools in cytophotometric DNA measurements.  相似文献   

12.
Frequency-domain analysis of biomolecular sequences   总被引:7,自引:0,他引:7  
MOTIVATION: Frequency-domain analysis of biomolecular sequences is hindered by their representation as strings of characters. If numerical values are assigned to each of these characters, then the resulting numerical sequences are readily amenable to digital signal processing. RESULTS: We introduce new computational and visual tools for biomolecular sequences analysis. In particular, we provide an optimization procedure improving upon traditional Fourier analysis performance in distinguishing coding from noncoding regions in DNA sequences. We also show that the phase of a properly defined Fourier transform is a powerful predictor of the reading frame of protein coding regions. Resulting color maps help in visually identifying not only the existence of protein coding areas for both DNA strands, but also the coding direction and the reading frame for each of the exons. Furthermore, we demonstrate that color spectrograms can visually provide, in the form of local 'texture', significant information about biomolecular sequences, thus facilitating understanding of local nature, structure and function.  相似文献   

13.
14.
SeqState     
Choosing and designing primers based on available DNA sequence data and statistical contrasting of domains or structural features is a common routine among molecular biologists. Currently available, free software tools were found to lack desirable features related to these tasks. This was the motivation for developing a new program, SeqState. SeqState locates regions that remain to be sequenced in phylogenetic DNA datasets, evaluates user-provided primers and selects primers best suited to fill gaps in the sequences. If the primers provided by the user are unsuitable, new primers are designed. Primers can be loaded from a primer database, be supplied as part of the alignment or be entered manually. The position of internal primers is automatically localised in the loaded data file. Primers can be edited, and changes and new primers can be saved to the database. Primer sheets allow the user to view internal dimers, complements to a second primer, mismatches to all loaded sequences, and other primer characteristics. Calculation of various sequence statistics can be requested for the whole dataset or parts thereof (character sets), with standard errors estimated by bootstrapping. Insertion-deletion events can be evaluated statistically and encoded for subsequent phylogenetic analysis according to several published coding principles.  相似文献   

15.
Selection of oligonucleotide probes for protein coding sequences   总被引:7,自引:0,他引:7  
MOTIVATION: Large arrays of oligonucleotide probes have become popular tools for analyzing RNA expression. However to date most oligo collections contain poorly validated sequences or are biased toward untranslated regions (UTRs). Here we present a strategy for picking oligos for microarrays that focus on a design universe consisting exclusively of protein coding regions. We describe the constraints in oligo design that are imposed by this strategy, as well as a software tool that allows the strategy to be applied broadly. RESULT: In this work we sequentially apply a variety of simple filters to candidate sequences for oligo probes. The primary filter is a rejection of probes that contain contiguous identity with any other sequence in the sample universe that exceeds a pre-established threshold length. We find that rejection of oligos that contain 15 bases of perfect match with other sequences in the design universe is a feasible strategy for oligo selection for probe arrays designed to interrogate mammalian RNA populations. Filters to remove sequences with low complexity and predicted poor probe accessibility narrow the candidate probe space only slightly. Rejection based on global sequence alignment is performed as a secondary, rather than primary, test, leading to an algorithm that is computationally efficient. Splice isoforms pose unique challenges and we find that isoform prevalence will for the most part have to be determined by analysis of the patterns of hybridization of partially redundant oligonucleotides. AVAILABILITY: The oligo design program OligoPicker and its source code are freely available at our website.  相似文献   

16.
ScaffoldSeq is software designed for the numerous applications—including directed evolution analysis—in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high‐throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site‐specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel . Proteins 2016; 84:869–874. © 2016 Wiley Periodicals, Inc.  相似文献   

17.
The multifractal analysis of binary images of DNA is studied in order to define a methodological approach to the classification of DNA sequences. This method is based on the computation of some multifractality parameters on a suitable binary image of DNA, which takes into account the nucleotide distribution. The binary image of DNA is obtained by a dot-plot (recurrence plot) of the indicator matrix. The fractal geometry of these images is characterized by fractal dimension (FD), lacunarity, and succolarity. These parameters are compared with some other coefficients such as complexity and Shannon information entropy. It will be shown that the complexity parameters are more or less equivalent to FD, while the parameters of multifractality have different values in the sense that sequences with higher FD might have lower lacunarity and/or succolarity. In particular, the genome of Drosophila melanogaster has been considered by focusing on the chromosome 3r, which shows the highest fractality with a corresponding higher level of complexity. We will single out some results on the nucleotide distribution in 3r with respect to complexity and fractality. In particular, we will show that sequences with higher FD also have a higher frequency distribution of guanine, while low FD is characterized by the higher presence of adenine.  相似文献   

18.
High-throughput DNA sequencing technologies are increasingly becoming powerful systems for the comprehensive analysis of variations in whole genomes or various DNA libraries. As they are capable of producing massive collections of short sequences with varying lengths, a major challenge is how to turn these reads into biologically meaningful information. The first stage is to assemble the short reads into longer sequences through an in silico process. However, currently available software/programs allow only the assembly of abundant sequences, which apparently results in the loss of highly variable (or rare) sequences or creates artefact assemblies. In this paper, we describe a novel program (DNAseq) that is capable of assembling highly variable sequences and displaying them directly for phylogenetic analysis. In addition, this program is Microsoft Windows-based and runs by a normal PC with 700MB RAM for a general use. We have applied it to analyse a human naive single-chain antibody (scFv) library, comprehensively revealing the diversity of antibody variable complementarity-determining regions (CDRs) and their families. Although only a scFv library was exemplified here, we envisage that this program could be applicable to other genome libraries.  相似文献   

19.
20.
MOTIVATION: Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. RESULTS: We have developed a new tool, based on the SIMPLE algorithm, that facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold. By modifying the sensitivity of the program simple sequence content can be studied at various levels, from highly organised tandem structures to complex combinations of repeats. We compare the relative amount of simplicity in different functional groups of yeast proteins and determine the level of clustering of the different amino acids in these proteins. AVAILABILITY: The program is available on request or online at http://www.biochem.ucl.ac.uk/bsm/SIMPLE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号