首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Characterization of protein primary sequences based on partial ordering   总被引:1,自引:0,他引:1  
In this paper, we present a new approach to characterize protein sequences. Based on orderings of the 20 natural amino acids which reflect some of their physico-chemical properties, we construct an augmented Hasse matrix for each protein sequence. Furthermore, the normalized leading eigenvalues of these matrices are computed and considered as invariants for the protein sequences. Finally, we make a comparison for the similarity/diversity of nine different protein sequences.  相似文献   

2.
In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3x3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

3.
A phylogenetic invariant for a model of biological sequence evolution along a phylogenetic tree is a polynomial that vanishes on the expected frequencies of base patterns at the terminal taxa. While the use of these invariants for phylogenetic inference has long been of interest, explicitly constructing such invariants has been problematic.We construct invariants for the general Markov model of kappa-base sequence evolution on an n-taxon tree, for any kappa and n. The method depends primarily on the observation that certain matrices defined in terms of expected pattern frequencies must commute, and yields many invariants of degree kappa+1, regardless of the value of n. We define strong and parameter-strong sets of invariants, and prove several theorems indicating that the set of invariants produced here has these properties on certain sets of possible pattern frequencies. Thus our invariants may be sufficient for phylogenetic applications.  相似文献   

4.
基于DNA序列的3D图形表示,通过L/L矩阵的规范化最大特征值组成的3维向量来刻画了DNA序列,并基于这种方法,用β-globin基因的第一个外显子分析了11个物种的相似性问题。  相似文献   

5.
Abstract

In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

6.
We find that the traditional numerical characterizations of biological sequences, such as E matrix, D/D matrix, L/L matrix and their "high order" matrices, have their limitations to characterize the biological sequences exactly, but they are widely used to analyze the biological sequences. Here, we propose a better numerical characterization for graphical representations of biological sequences, C(i,j) matrix. It is associated with the curvature of every point and has many advantages: (1) It can characterize the graphical representations for DNA sequences exactly, because it can overcome the limitation of the traditional matrices. (2) If we choose an appropriate fixed point, we can make the elements of the C(i,j) matrix less than or equal to 1.  相似文献   

7.
It is known that if all the Markov transition matrices that govern the substitution of one nucleotide for another satisfy six linear constraints, then equations can be derived that permit one to infer evolutionary trees from nucleic acid sequences by the method of linear invariants. These sufficient conditions are also necessary. Any relaxation of them results in the loss of all linear invariants. Necessary conditions for any given set of linear invariants can be derived by examining conditions a matrix must satisfy to map a certain set of matrices into itself. To the extent that necessary conditions are incorrect, a method is not reliable. In a world where different parts of molecules evolve at different rates, the two-parameter model of Kimura may not be empirically distinguishable from the more general one treated here.  相似文献   

8.
9.
Information about conformational properties of a protein is contained in the hydrophobicity values of the amino acids in its primary sequence. We have investigated the possibility of extracting meaningful evolutionary information from the comparison of the hydrophobicity values of the corresponding amino acids in the sequences of homologous proteins. Distance matrices for six families of homologous proteins were made on the basis of the differences in hydrophobicity values of the amino acids. The phylogenetic trees constructed from such matrices were at least as good (as judged from their faithful reflection of evolutionary relationships), as trees constructed from the usual minimum mutation distance matrix.  相似文献   

10.
In this paper, we propose a simple method to analyze the similarity of biological sequences. By taking the average contents of biological sequences and their information entropies as the variables, the fuzzy method is used to cluster them. From the results of application, it finds that the method is relatively easy and rapid. Unlike other methods such as the graphical representation methods, which is usually very complex to compute some invariants of matric derived from graphical representation, our method pays more attention to the information of biological sequences themselves. Especially with the help of the software (SPSS), it seems to be very convenient. Therefore, it may be used to study the new biological sequences such as their evolution relationship and structures.  相似文献   

11.
12.
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gr?bner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gr?bner bases consist of certain explicitly constructed polynomials of degree at most four.  相似文献   

13.
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an Abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gr?bner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gr?bner bases consist of certain explicitly constructed polynomials of degree at most four.  相似文献   

14.
MOTIVATION: The general-time-reversible (GTR) model is one of the most popular models of nucleotide substitution because it constitutes a good trade-off between mathematical tractability and biological reality. However, when it is applied for inferring evolutionary distances and/or instantaneous rate matrices, the GTR model seems more prone to inapplicability than more restrictive time-reversible models. Although it has been previously noted that the causes for intractability are caused by the impossibility of computing the logarithm of a matrix characterised by negative eigenvalues, the issue has not been investigated further. RESULTS: Here, we formally characterize the mathematical conditions, and discuss their biological interpretation, which lead to the inapplicability of the GTR model. We investigate the relations between, on one hand, the occurrence of negative eigenvalues and, on the other hand, both sequence length and sequence divergence. We then propose a possible re-formulation of previous procedures in terms of a non-linear optimization problem. We analytically investigate the effect of our approach on the estimated evolutionary distances and transition probability matrix. Finally, we provide an analysis on the goodness of the solution we propose. A numerical example is discussed.  相似文献   

15.
《BIOSILICO》2003,1(3):89-96
The function(s) of a novel gene or gene product can be inferred by associating the gene or gene product with those whose functions are known. It is now common practice to associate two genes if they have similar sequences. In recent years, computational methods have been developed that associate genes on the basis of features beyond similarity, using a variety of biological data beyond single-gene sequences. This review describes several promising methods that associate genes or gene products. These associative methods employ similarity of sequences and structures, features from whole-genome analysis, co-expression patterns from microarray and EST data, interacting properties from proteomic data, and links from literature mining. Finally, we outline issues surrounding the validation and integration of these methods.  相似文献   

16.
Properties of spectral components of the system matrix of linear time-invariant discrete or continuous models are investigated. It is shown that the entries in these matrices have the interpretation of being the sensitivity of the system matrix eigenvalues with respect to the model parameters. The spectral resolution formula for linear operators is used to get explicit results about component matrices and eigenvalue sensitivity. In biological modeling, particular interest is in the real maximal or minimal roots of the system matrix. Exact formulation of the related spectral components is made in important system matrix cases such as companion, Leslie, ecosystem, compartmental, and stochastic matrices.  相似文献   

17.
The evolutionary selection forces acting on a protein are commonly inferred using evolutionary codon models by contrasting the rate of synonymous to nonsynonymous substitutions. Most widely used models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates. In this paper, we develop a general method that allows assimilation of empirical amino acid replacement probabilities into a codon-substitution matrix. In this way, the resulting codon model takes into account not only the transition-transversion bias and the nonsynonymous/synonymous ratio, but also the different amino acid replacement probabilities as specified in empirical amino acid matrices. Different empirical amino acid replacement matrices, such as secondary structure-specific matrices or organelle-specific matrices (e.g., mitochondria and chloroplasts), can be incorporated into the model, making it context dependent. Using a diverse set of coding DNA sequences, we show that the novel model better fits biological data as compared with either mechanistic or empirical codon models. Using the suggested model, we further analyze human immunodeficiency virus type 1 protease sequences obtained from drug-treated patients and reveal positive selection in sites that are known to confer drug resistance to the virus.  相似文献   

18.
The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices D(i)∈R(m(i) × n), each with full column rank. Each matrix is exactly factored as D(i)=U(i)Σ(i)V(T), where V, identical in all factorizations, is obtained from the eigensystem SV=VΛ of the arithmetic mean S of all pairwise quotients A(i)A(j)(-1) of the matrices A(i)=D(i)(T)D(i), i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λ(k)≥1. Equality holds if and only if the corresponding eigenvector v(k) is a right basis vector of equal significance in all matrices D(i) and D(j), that is σ(i,k)/σ(j,k)=1 for all i and j, and the corresponding left basis vector u(i,k) is orthogonal to all other vectors in U(i) for all i. The eigenvalues λ(k)=1, therefore, define the "common HO GSVD subspace." We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.  相似文献   

19.
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the online type, and they generalize classical multipattern matching, filtering, and superalphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other online and index-based algorithms proposed in the literature. Compared to the brute-force O(mn) approach, our solutions can be faster by a factor that is proportional to the matrix length m. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches (p = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.  相似文献   

20.

Background

There are several common ways to encode a tree as a matrix, such as the adjacency matrix, the Laplacian matrix (that is, the infinitesimal generator of the natural random walk), and the matrix of pairwise distances between leaves. Such representations involve a specific labeling of the vertices or at least the leaves, and so it is natural to attempt to identify trees by some feature of the associated matrices that is invariant under relabeling. An obvious candidate is the spectrum of eigenvalues (or, equivalently, the characteristic polynomial).

Results

We show for any of these choices of matrix that the fraction of binary trees with a unique spectrum goes to zero as the number of leaves goes to infinity. We investigate the rate of convergence of the above fraction to zero using numerical methods. For the adjacency and Laplacian matrices, we show that the a priori more informative immanantal polynomials have no greater power to distinguish between trees.

Conclusion

Our results show that a generic large binary tree is highly unlikely to be identified uniquely by common spectral invariants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号