首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.  相似文献   

2.
Among the statistical methods available to control for phylogenetic autocorrelation in ecological data, those based on eigenfunction analysis of the phylogenetic distance matrix among the species are becoming increasingly important tools. Here, we evaluate a range of criteria to select eigenvectors extracted from a phylogenetic distance matrix (using phylogenetic eigenvector regression, PVR) that can be used to measure the level of phylogenetic signal in ecological data and to study correlated evolution. We used a principal coordinate analysis to represent the phylogenetic relationships among 209 species of Carnivora by a series of eigenvectors, which were then used to model log‐transformed body size. We first conducted a series of PVRs in which we increased the number of eigenvectors from 1 to 70, following the sequence of their associated eigenvalues. Second, we also investigated three non‐sequential approaches based on the selection of 1) eigenvectors significantly correlated with body size, 2) eigenvectors selected by a standard stepwise algorithm, and 3) the combination of eigenvectors that minimizes the residual phylogenetic autocorrelation. We mapped the mean specific component of body size to evaluate how these selection criteria affect the interpretation of non‐phylogenetic signal in Bergmann's rule. For comparison, the same patterns were analyzed using autoregressive model (ARM) and phylogenetic generalized least‐squares (PGLS). Despite the robustness of PVR to the specific approaches used to select eigenvectors, using a relatively small number of eigenvectors may be insufficient to control phylogenetic autocorrelation, leading to flawed conclusions about patterns and processes. The method that minimizes residual autocorrelation seems to be the best choice according to different criteria. Thus, our analyses show that, when the best criterion is used to control phylogenetic structure, PVR can be a valuable tool for testing hypotheses related to heritability at the species level, phylogenetic niche conservatism and correlated evolution between ecological traits.  相似文献   

3.
Backbone cluster identification in proteins by a graph theoretical method   总被引:4,自引:0,他引:4  
A graph theoretical algorithm has been developed to identify backbone clusters of residues in proteins. The identified clusters show protein sites with the highest degree of interactions. An adjacency matrix is constructed from the non-bonded connectivity information in proteins. The diagonalization of such a matrix yields eigenvalues and eigenvectors, which contain the information on clusters. In graph theory, distinct clusters can be obtained from the second lowest eigenvector components of the matrix. However, in an interconnected graph, all the points appear as one single cluster. We have developed a method of identifying highly interacting centers (clusters) in proteins by truncating the vector components of high eigenvalues. This paper presents in detail the method adopted for identifying backbone clusters and the application of the algorithm to families of proteins like RNase-A and globin. The objective of this study was to show the efficiency of the algorithm as well as to detect conserved or similar backbone packing regions in a particular protein family. Three clusters in topologically similar regions in the case of the RNase-A family and three clusters around the porphyrin ring in the globin family were observed. The predicted clusters are consistent with the features of the family of proteins such as the topology and packing density. The method can be applied to problems such as identification of domains and recognition of structural similarities in proteins.  相似文献   

4.
The conditional autoregressive model and the intrinsic autoregressive model are widely used as prior distribution for random spatial effects in Bayesian models. Several authors have pointed out impractical or counterintuitive consequences on the prior covariance matrix or the posterior covariance matrix of the spatial random effects. This article clarifies many of these puzzling results. We show that the neighborhood graph structure, synthesized in eigenvalues and eigenvectors structure of a matrix associated with the adjacency matrix, determines most of the apparently anomalous behavior. We illustrate our conclusions with regular and irregular lattices including lines, grids, and lattices based on real maps.  相似文献   

5.
The “spread” of the nonzero eigenvalues of a compartmental matrix is studied by reference to the associated directed graph. It is related to the eigenvalues of the matrices of the individual cycles for certain strongly connected directed graphs. The equilibrium solution to the entire model is also an equilibrium solution to the model consisting of the individual cycles.  相似文献   

6.
7.
8.
Ecological and evolutionary studies are often concerned with the properties of covariance matrices. The method of random skewers (RS method) has been used compare a matrix to an a priori vector or to compare two matrices. The method involves multiplying a matrix by many random vectors drawn from a uniform distribution over all possible vector directions. The comparisons are usually made using the average angle (or cosine) of the response vectors to an a priori vector or to the response vectors corresponding from another matrix. Angles are usually constrained to the interval 0°–90° because the distribution of response vectors is bipolar bimodal. The size of the average angle or cosine depends strongly on the relative sizes of the eigenvalues (especially the first). The distribution of angles between pairs of response vectors from two covariance matrices is more complicated because it depends on the differences in orientation of the eigenvectors and the relative sizes of the eigenvalues of the both matrices. The average absolute value of the angles between these pairs of response vectors depends on the relative sizes of the eigenvalues of the matrices making it difficult to interpret its meaning without knowledge of the eigenvalues and eigenvectors of the two matrices. Thus, it is simpler to just directly compare matrices in terms of these quantities.  相似文献   

9.
A number of metrics have been developed for estimating phylogenetic signal in data and to evaluate correlated evolution, inferring broad-scale evolutionary and ecological processes. Here, we proposed an approach called phylogenetic signal-representation (PSR) curve, built upon phylogenetic eigenvector regression (PVR). In PVR, selected eigenvectors extracted from a phylogenetic distance matrix are used to model interspecific variation. In the PSR curve, sequential PVR models are fitted after successively increasing the number of eigenvectors and plotting their R(2) against the accumulated eigenvalues. We used simulations to show that a linear PSR curve is expected under Brownian motion and that its shape changes under alternative evolutionary models. The PSR area, expressing deviations from Brownian motion, is strongly correlated (r= 0.873; P < 0.01) with Blomberg's K-statistics, so nonlinear PSR curves reveal if traits are evolving at a slower or higher rate than expected by Brownian motion. The PSR area is also correlated with phylogenetic half-life under an Ornstein-Uhlenbeck process, suggesting how both methods describe the shape of the relationship between interspecific variation and time since divergence among species. The PSR curve provides an elegant exploratory method to understand deviations from Brownian motion, in terms of acceleration or deceleration of evolutionary rates occurring at large or small phylogenetic distances.  相似文献   

10.
11.
In this paper, a new method for QRS complex analysis and estimation based on principal component analysis (PCA) and polynomial fitting techniques is presented. Multi-channel ECG signals were recorded and QRS complexes were obtained from every channel and aligned perfectly in matrices. For every channel, the covariance matrix was calculated from the QRS complex data matrix of many heartbeats. Then the corresponding eigenvectors and eigenvalues were calculated and reconstruction parameter vectors were computed by expansion of every beat in terms of the principal eigenvectors. These parameter vectors show short-term fluctuations that have to be discriminated from abrupt changes or long-term trends that might indicate diseases. For this purpose, first-order poly-fit methods were applied to the elements of the reconstruction parameter vectors. In healthy volunteers, subsequent QRS complexes were estimated by calculating the corresponding reconstruction parameter vectors derived from these functions. The similarity, absolute error and RMS error between the original and predicted QRS complexes were measured. Based on this work, thresholds can be defined for changes in the parameter vectors that indicate diseases.  相似文献   

12.
We propose a new method to estimate and correct for phylogenetic inertia in comparative data analysis. The method, called phylogenetic eigenvector regression (PVR) starts by performing a principal coordinate analysis on a pairwise phylogenetic distance matrix between species. Traits under analysis are regressed on eigenvectors retained by a broken-stick model in such a way that estimated values express phylogenetic trends in data and residuals express independent evolution of each species. This partitioning is similar to that realized by the spatial autoregressive method, but the method proposed here overcomes the problem of low statistical performance that occurs with autoregressive method when phylogenetic correlation is low or when sample size is too small to detect it. Also, PVR is easier to perform with large samples because it is based on well-known techniques of multivariate and regression analyses. We evaluated the performance of PVR and compared it with the autoregressive method using real datasets and simulations. A detailed worked example using body size evolution of Carnivora mammals indicated that phylogenetic inertia in this trait is elevated and similarly estimated by both methods. In this example, Type I error at α = 0.05 of PVR was equal to 0.048, but an increase in the number of eigenvectors used in the regression increases the error. Also, similarity between PVR and the autoregressive method, defined by correlation between their residuals, decreased by overestimating the number of eigenvalues necessary to express the phylogenetic distance matrix. To evaluate the influence of cladogram topology on the distribution of eigenvalues extracted from the double-centered phylogenetic distance matrix, we analyzed 100 randomly generated cladograms (up to 100 species). Multiple linear regression of log transformed variables indicated that the number of eigenvalues extracted by the broken-stick model can be fully explained by cladogram topology. Therefore, the broken-stick model is an adequate criterion for determining the correct number of eigenvectors to be used by PVR. We also simulated distinct levels of phylogenetic inertia by producing a trend across 10, 25, and 50 species arranged in “comblike” cladograms and then adding random vectors with increased residual variances around this trend. In doing so, we provide an evaluation of the performance of both methods with data generated under different evolutionary models than tested previously. The results showed that both PVR and autoregressive method are efficient in detecting inertia in data when sample size is relatively high (more than 25 species) and when phylogenetic inertia is high. However, PVR is more efficient at smaller sample sizes and when level of phylogenetic inertia is low. These conclusions were also supported by the analysis of 10 real datasets regarding body size evolution in different animal clades. We concluded that PVR can be a useful alternative to an autoregressive method in comparative data analysis.  相似文献   

13.
We explored the impact of phylogeny shape on the results of interspecific statistical analyses incorporating phylogenetic information. In most phylogenetic comparative methods (PCMs), the phylogeny can be represented as a relationship matrix, and the hierarchical nature of interspecific phylogenies translates into a distinctive blocklike matrix that can be described by its eigenvectors (topology) and eigenvalues (branch lengths). Thus, differences in the eigenvectors and eigenvalues of different relationship matrices can be used to gauge the impact of possible phylogeny errors by comparing the actual phylogeny used in a PCM analysis with a second phylogenetic hypothesis that may be more accurate. For example, we can use the sum of inverse eigenvalues as a rough index to compare the impact of phylogenies with different branch lengths. Topological differences are better described by the eigenvectors. In general, phylogeny errors that involve deep splits in the phylogeny (e.g., moving a taxon across the base of the phylogeny) are likely to have much greater impact than will those involving small perturbations in the fine structure near the tips. Small perturbations, however, may have more of an impact if the phylogeny structure is highly dependent (with many recent splits near the tips of the tree). Unfortunately, the impact of any phylogeny difference on the results of a PCM depends on the details of the data being considered. Recommendations regarding the choice, design, and statistical power of interspecific analyses are also made.  相似文献   

14.
We study the effects of a signalling constraint on an individual-based model of self-organizing group formation using a coarse analysis framework. This involves using an automated data-driven technique which defines a diffusion process on the graph of a sample dataset formed from a representative stationary simulation. The eigenvectors of the graph Laplacian are used to construct 'diffusion-map' coordinates which provide a geometrically meaningful low-dimensional representation of the dataset. We show that, for the parameter regime studied, the second principal eigenvector provides a sufficient representation of the dataset and use it as a coarse observable. This allows the computation of coarse bifurcation diagrams, which are used to compare the effects of the signalling constraint on the population-level behavior of the model.  相似文献   

15.
16.
The completely symmetrical system is defined as having identical transfer coefficients between pairs of compartments and the same loss coefficient for each compartment. The eigenvalues and eigenvector are explicitly found along with the inverses of the system matrix and the matrix of eigenvectors. Many properties, special instances of more general theorems, can be seen at once from the explicit analytic solution of the initial value, washout and washin problems. The system serves as a known case for testing estimation procedures, algorithms for solutions of linear systems, eigenvalue-eigenvector and inversion routines and is of considerable tutorial value.  相似文献   

17.
18.
Graph representations have been widely used to analyze and design various economic, social, military, political, and biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize mathematical properties of the Laplacian eigenvector (µ2) corresponding to the second eigenvalues (λ2) associated with the topology matrix defining the graph: λ2 describes the overall topology, and the sum of µ2′s components is zero. The three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, and largest gap of µ2′s components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA structures up through 11 vertices (∼220 nucleotides). While we observe that the median cut divides a graph into two similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs.  相似文献   

19.
One direction in exploring similarities among biological sequences (such as DNA, RNA, and proteins), is to associate with such systems ordered sets of sequence invariants. These invariants represent selected properties of mathematical objects, such as matrices, that one can associate with biological sequences. In this article, we are exploring properties of recently introduced Line Distance matrices, and in particular we consider properties of their eigenvalues. We prove that Line Distance matrices of size n have one positive and n - 1 negative eigenvalues. Visual representation of Cauchy's interlacing property for Line Distance matrices is considered. Matlab programs for line distance matrices and examples are available on the following website: www.fmf.uni-lj.si/ approximately jaklicg/ldmatrix.html.  相似文献   

20.
Graphs such as de Bruijn graphs and OLC (overlap-layout-consensus) graphs have been widely adopted for the de novo assembly of genomic short reads. This work studies another important problem in the field: how graphs can be used for high-performance compression of the large-scale sequencing data. We present a novel graph definition named Hamming-Shifting graph to address this problem. The definition originates from the technological characteristics of next-generation sequencing machines, aiming to link all pairs of distinct reads that have a small Hamming distance or a small shifting offset or both. We compute multiple lexicographically minimal k-mers to index the reads for an efficient search of the weight-lightest edges, and we prove a very high probability of successfully detecting these edges. The resulted graph creates a full mutual reference of the reads to cascade a code-minimized transfer of every child-read for an optimal compression. We conducted compression experiments on the minimum spanning forest of this extremely sparse graph, and achieved a 10 − 30% more file size reduction compared to the best compression results using existing algorithms. As future work, the separation and connectivity degrees of these giant graphs can be used as economical measurements or protocols for quick quality assessment of wet-lab machines, for sufficiency control of genomic library preparation, and for accurate de novo genome assembly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号