首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The leading eigenvalue of the matrix associated with a DNA sequence as a important invariant is effectively used in analysis of DNA sequences. Here, we propose a new invariant base on the 2DD-Curves of DNA sequences which is simple for calculation. We can use it as an alternative invariant to characterize the DNA sequence. The utility of the new parameter is illustrated on the DNA sequences of 11 species.  相似文献   

2.
An evolutionary model for maximum likelihood alignment of DNA sequences   总被引:16,自引:0,他引:16  
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.  相似文献   

3.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

4.
Goto N  Kurokawa K  Yasunaga T 《Gene》2007,401(1-2):172-180
To date, the complete genome sequences of more than 250 organisms have been determined. This information can now be used to determine whether there exist any invariant sequences that are conserved among all organisms, from bacteria to plants, animals, and humans. The existence of invariant sequences would strongly suggest that these sequences have been inherited unchanged from the last common ancestor of all life, and that they have essential functions. We have developed a new software program to identify invariant sequences conserved among the currently sequenced genomes and applied this analysis to the complete genome sequences of 266 organisms. We have identified 3 invariant DNA sequences longer than or equal to 11 bp and 6 invariant amino acid sequences longer than or equal to 6 aa. The longest invariant DNA sequence, AAGTCGTACAAGGT (15 bp), was found in the 16S/18S rRNA gene. Two 8 aa sequences, GHVDHGKT in IF2 and EF-Tu and DTPGHVDF in EF-G, were the longest invariant amino acid sequences detected. These sequences could be essential elements from the genome of the last common ancestor and may have remained unchanged throughout evolution.  相似文献   

5.
In this study, we wanted to inspect whether the evolutionary driven differences in primary sequences could correlate, and thus predict the genetic diversity of related marker loci, which is an important criterion to assess the quality of any DNA marker. We adopted new approach of quantitative symbolic DNA sequence analysis called DNA random walk representation to study multiallelic marker loci from Begonia × tuberhybrida Voss. We described significant correlation of random walk-derived digital invariants to genetic diversity of the marker loci. Specifically, on the 3D-contour plot of multivariate principal component analysis (PCA), we revealed statistical correlation between the first two PCA factors and the number of alleles per marker locus. Based on that correlation, we suggest that DNA walk representation may predict allele-rich loci solely from their primary sequences, which improves current design of new DNA germplasm identificators.  相似文献   

6.
Comparative genomics is a powerful tool of genome functional specificity predictions and investigation of evolution specificity. Background of a large field of bioinformatics investigations is a computation of different scores of sequences and comparing them with a threshold. Comparative genomic analysis involves scores comparing for orthological groups of genetic objects. In this paper we represent a statistical approach to comparative genomic analysis, that based on investigation of diffusion in sequence space determined by neutral evolution of sequences. Using this approach we represent several statistics for selection pressure estimation and analyze statistics for several biological problems. We formulate technology of statistics applying to obtain new biological information. This approach is represented as Java-class library.  相似文献   

7.
Dai Q  Li L  Liu X  Yao Y  Zhao F  Zhang M 《PloS one》2011,6(11):e26779
Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models.  相似文献   

8.
DNA harvested directly from complex natural microbial communities by PCR has been successfully used to predict RNase P RNA structure, and can potentially provide an abundant source of information for structural predictions of other RNAs. In this study, we utilized genetic variation in natural communities to test and refine the secondary and tertiary structural model for the bacterial tmRNA. The variability of proposed tmRNA secondary structures in different organisms and the lack of any predicted tertiary structure suggested that further refinement of the tmRNA could be useful. To increase the phylogenetic representation of tmRNA sequences, and thereby provide additional data for statistical comparative analysis, we amplified, sequenced, and compared tmRNA sequences from natural microbial communities. Using primers designed from gamma proteobacterial sequences, we determined 44 new tmRNA sequences from a variety of environmental DNA samples. Covariation analyses of these sequences, along with sequences from cultured organisms, confirmed most of the proposed tmRNA model but also provided evidence for a new tertiary interaction. This approach of gathering sequence information from natural microbial communities seems generally applicable in RNA structural analysis.  相似文献   

9.
Here we propose a weighted measure for the similarity analysis of DNA sequences. It is based on LZ complexity and (0,1) characteristic sequences of DNA sequences. This weighted measure enables biologists to extract similarity information from biological sequences according to their requirements. For example, by this weighted measure, one can obtain either the full similarity information or a similarity analysis from a given biological aspect. Moreover, the length of DNA sequence is not problematic. The application of the weighted measure to the similarity analysis of β-globin genes from nine species shows its flexibility.  相似文献   

10.
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding.  相似文献   

11.
Each diploid organism has two alleles at every gene locus. In sexual organisms such as most plants, animals and fungi, the two alleles in an individual may be genetically very different from each other. DNA sequence data from individual alleles (called a haplotype) can provide powerful information to address a variety of biological questions and guide many practical applications. The advancement in molecular technology and computational tools in the last decade has made obtaining large-scale haplotypes feasible. This review summarizes the two basic approaches for obtaining haplotypes and discusses the associated techniques and methods. The first approach is to experimentally obtain diploid sequence information and then use computer algorithms to infer haplotypes. The second approach is to obtain haplotype sequences directly through experimentation. The advantages and disadvantages of each approach are discussed. I then discussed a specific example on how the direct approach was used to obtain haplotype information to address several fundamental biological questions of a pathogenic yeast. With increasing sophistication in both bioinformatics tools and high-throughput molecular techniques, haplotype analysis is becoming an integrated component in biomedical research.  相似文献   

12.
Pathways database system: an integrated system for biological pathways   总被引:1,自引:0,他引:1  
MOTIVATION: During the next phase of the Human Genome Project, research will focus on functional studies of attributing functions to genes, their regulatory elements, and other DNA sequences. To facilitate the use of genomic information in such studies, a new modeling perspective is needed to examine and study genome sequences in the context of many kinds of biological information. Pathways are the logical format for modeling and presenting such information in a manner that is familiar to biological researchers. RESULTS: In this paper we present an integrated system, called Pathways Database System, with a set of software tools for modeling, storing, analyzing, visualizing, and querying biological pathways data at different levels of genetic, molecular, biochemical and organismal detail. The novel features of the system include: (a) genomic information integrated with other biological data and presented from a pathway, rather than from the DNA sequence, perspective; (b) design for biologists who are possibly unfamiliar with genomics, but whose research is essential for annotating gene and genome sequences with biological functions; (c) database design, implementation and graphical tools which enable users to visualize pathways data in multiple abstraction levels, and to pose predetermined queries; and (d) an implementation that allows for web(XML)-based dissemination of query outputs (i.e. pathways data) to researchers in the community, giving them control on the use of pathways data. AVAILABILITY: Available on request from the authors.  相似文献   

13.
14.
An annotated bibliography of mathematical and computer analyses of protein and nucleic acid sequences is presented. The major subject areas represented are the determination of sequences, restriction mapping, similarity searching, sequence alignment, codon utilization, statistical analysis, information theoretic analysis, the construction of secondary and tertiary structure and DNA topology.  相似文献   

15.
16.
P Dube  P Tavares  R Lurz    M van Heel 《The EMBO journal》1993,12(4):1303-1309
Electron microscopy in combination with image processing is a powerful method for obtaining structural information on non-crystallized biological macromolecules at the 10-50 A resolution level. The processing of noisy microscopical images requires advanced data processing methodologies in which one must carefully avoid the introduction of any form of bias into the data set. Using a novel multivariate statistical approach to the analysis of symmetry, we studied the structure of the bacteriophage SPP1 portal protein oligomer. This portal structure, ubiquitous in icosahedral bacteriophages which package dsDNA, is located at the site of symmetry mismatch between a 5-fold vertex of the icosahedral shell and the 6-fold symmetric (helical) tail. From previous studies such 'head-to-tail connector' structures were generally accepted to be homododecamers assembled in a 12-fold symmetric ring around a central channel. Using a new analysis methodology we have found that the phage SPP1 portal structure exhibits 13-fold cyclical symmetry: a new point group organization for oligomeric proteins. A model for the DNA packaging mechanism by 13-fold symmetric portal protein assemblies is presented which attributes a coherent functional meaning to their unusual symmetry.  相似文献   

17.
The concept of nucleic acid sequence base alternations is presented.The number of base alterations for the sequences of differentlength is established. The definition of "enlarged similarity"of nucleic acids sequences on the basis of sequence base alterationsis introduced. Mutual information between sequences is usedas a quantitative measure of enlarged similarity for two comparedsequences. The method of mutual information calculation is developedconsidering the correlation of bases in compared sequences.The definitions of correlated similarity and evolution similaritybetween compared sequences are given. Results of the use ofenlarged similarity approach for DNA sequences analysis arediscussed.  相似文献   

18.
19.
DNA序列信息的一种新的测度   总被引:4,自引:3,他引:1  
根据信息理论给出了测度DNA序列信息的一种新的方法,获得DNA序列4个层次的信息量测度:Ib,If(1),If(2)andIf(3),这4种信息测度可分别用来测度DNA的碱基序列、密码子序列、编码蛋白质序列和功能蛋白质序列的信息量。从M.edulis的线粒体基因组中两个较短的编码蛋白质的DNA序列和使用具有不同倍性的间并密码子组组成的模拟DNA序列中所获得计算结果表明,这些信息测度确实能用来揭示所  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号