首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 672 毫秒
1.
In this paper, we present a heuristic algorithm based on the simulated annealing, SAQ-Net, as a method for constructing phylogenetic networks from weighted quartets. Similar to QNet algorithm, SAQ-Net constructs a collection of circular weighted splits of the taxa set. This collection is represented by a split network. In order to show that SAQ-Net performs better than QNet, we apply these algorithm to both the simulated and actual data sets containing salmonella, Bees, Primates and Rubber data sets. Then we draw phylogenetic networks corresponding to outputs of these algorithms using SplitsTree4 and compare the results. We find that SAQ-Net produces a better circular ordering and phylogenetic networks than QNet in most cases. SAQ-Net has been implemented in Matlab and is available for download at http://bioinf.cs.ipm.ac.ir/softwares/saq.net.  相似文献   

2.
Deep learning has achieved great success in areas such as computer vision and natural language processing. In the past, some work used convolutional networks to process EEG signals and reached or exceeded traditional machine learning methods. We propose a novel network structure and call it QNet. It contains a newly designed attention module: 3D-AM, which is used to learn the attention weights of EEG channels, time points, and feature maps. It provides a way to automatically learn the electrode and time selection. QNet uses a dual branch structure to fuse bilinear vectors for classification. It performs four, three, and two classes on the EEG Motor Movement/Imagery Dataset. The average cross-validation accuracy of 65.82%, 74.75%, and 82.88% was obtained, which are 7.24%, 4.93%, and 2.45% outperforms than the state-of-the-art, respectively. The article also visualizes the attention weights learned by QNet and shows its possible application for electrode channel selection.  相似文献   

3.
Application of phylogenetic networks in evolutionary studies   总被引:42,自引:0,他引:42  
The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.  相似文献   

4.
Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp , a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny‐aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp ‐derived metrics can classify samples by their diversity‐correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp .  相似文献   

5.
We describe a method that will reconstruct an unrooted binary phylogenetic level-1 network on \(n\) taxa from the set of all quartets containing a certain fixed taxon, in \(O(n^3)\) time. We also present a more general method which can handle more diverse quartet data, but which takes \(O(n^6)\) time. Both methods proceed by solving a certain system of linear equations over the two-element field \(\mathrm{GF}(2)\) . For a general dense quartet set, i.e. a set containing at least one quartet on every four taxa, our \(O(n^6)\) algorithm constructs a phylogenetic level-1 network consistent with the quartet set if such a network exists and returns an \(O(n^2)\) -sized certificate of inconsistency otherwise. This answers a question raised by Gambette, Berry and Paul regarding the complexity of reconstructing a level-1 network from a dense quartet set, and more particularly regarding the complexity of constructing a cyclic ordering of taxa consistent with a dense quartet set.  相似文献   

6.
In recent studies, phylogenetic networks have been derived from so-called multilabeled trees in order to understand the origins of certain polyploids. Although the trees used in these studies were constructed using sophisticated techniques in phylogenetic analysis, the presented networks were inferred using ad hoc arguments that cannot be easily extended to larger, more complicated examples. In this paper, we present a general method for constructing such networks, which takes as input a multilabeled phylogenetic tree and outputs a phylogenetic network with certain desirable properties. To illustrate the applicability of our method, we discuss its use in reconstructing the evolutionary history of plant allopolyploids. We conclude with a discussion concerning possible future directions. The network construction method has been implemented and is freely available for use from http://www.uea.ac.uk/ approximately a043878/padre.html.  相似文献   

7.
  1. Ecological networks are valuable for ecosystem analysis but their use is often limited by a lack of data because many types of ecological interaction, for example, predation, are short‐lived and difficult to observe or detect. While there are different methods for inferring the presence of interactions, they have rarely been used to predict the interaction strengths that are required to construct weighted, or quantitative, ecological networks.
  2. Here, we develop a trait‐based approach suitable for inferring weighted networks, that is, with varying interaction strengths. We developed the method for seed‐feeding carabid ground beetles (Coleoptera: Carabidae) although the principles can be applied to other species and types of interaction.
  3. Using existing literature data from experimental seed‐feeding trials, we predicted a per‐individual interaction cost index based on carabid and seed size. This was scaled up to the population level to create inferred weighted networks using the abundance of carabids and seeds from empirical samples and energetic intake rates of carabids from the literature. From these weighted networks, we also derived a novel measure of expected predation pressure per seed type per network.
  4. This method was applied to existing ecological survey data from 255 arable fields with carabid data from pitfall traps and plant seeds from seed rain traps. Analysis of these inferred networks led to testable hypotheses about how network structure and predation pressure varied among fields.
  5. Inferred networks are valuable because (a) they provide null models for the structuring of food webs to test against empirical species interaction data, for example, DNA analysis of carabid gut regurgitates and (b) they allow weighted networks to be constructed whenever we can estimate interactions between species and have ecological census data available. This permits ecological network analysis even at times and in places when interactions were not directly assessed.
  相似文献   

8.
Cell-free systems containing multiple enzymes are becoming an increasingly interesting tool for one-pot syntheses of biochemical compounds. To extensively explore the enormous wealth of enzymes in the biological space, we present methods for assembling and curing data from databases to apply them for the prediction of pathway candidates for directed enzymatic synthesis. We use Kyoto Encyclopedia of Genes and Genomes to establish single organism models and a pan-organism model that is combining the available data from all organisms listed there. We introduce a filtering scheme to remove data that are not suitable, for example, generic metabolites and general reactions. In addition, a valid stoichiometry of reactions is required for acceptance. The networks created are analyzed by graph theoretical methods to identify a set of metabolites that are potentially reachable from a defined set of starting metabolites. Thus, metabolites not connected to such starting metabolites cannot be produced unless new starting metabolites or reactions are introduced. The network models also comprise stoichiometric and thermodynamic data that allow the definition of constraints to identify potential pathways. The resulting data can be directly applied using existing or future pathway finding tools.  相似文献   

9.
The inverse normal and Fisher's methods are two common approaches for combining P-values. Whitlock demonstrated that a weighted version of the inverse normal method, or 'weighted Z-test', is superior to Fisher's method for combining P-values for one-sided T-tests. The problem with Fisher's method is that it does not take advantage of weighting and loses power to the weighted Z-test when studies are differently sized. This issue was recently revisited by Chen, who observed that Lancaster's variation of Fisher's method had higher power than the weighted Z-test. Nevertheless, the weighted Z-test has comparable power to Lancaster's method when its weights are set to square roots of sample sizes. Power can be further improved when additional information is available. Although there is no single approach that is the best in every situation, the weighted Z-test enjoys certain properties that make it an appealing choice as a combination method for meta-analysis.  相似文献   

10.
Sim J  Kim SY  Lee J 《Proteins》2005,59(3):627-632
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the position-specific scoring matrix (PSSM) generated by PSI-BLAST. We have presented a method, PPRODO (Prediction of PROtein DOmain boundaries) that predicts domain boundaries of proteins from sequence information by a neural network. The network is trained and tested using the values obtained from the PSSM generated by PSI-BLAST. A 10-fold cross-validation technique is performed to obtain the parameters of neural networks using a nonredundant set of 522 proteins containing 2 contiguous domains. PPRODO provides good and consistent results for the prediction of domain boundaries, with accuracy of about 66% using the +/-20 residue criterion. The PPRODO source code, as well as all data sets used in this work, are available from http://gene.kias.re.kr/ approximately jlee/pprodo/.  相似文献   

11.
ABSTRACT: BACKGROUND: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. RESULTS: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. CONCLUSIONS: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.  相似文献   

12.
Summary We have recently described a method of building phylogenetic trees and have outlined an approach for proving whether a particular tree is optimal for the data used. In this paper we describe in detail the method of establishing lower bounds on the length of a minimal tree by partitioning the data set into subsets. All characters that could be involved in duplications in the data are paired with all other such characters. A matching algorithm is then used to obtain the pairing of characters that reveals the most duplications in the data. This matching may still not account for all nucleotide substitutions on the tree. The structure of the tree is then used to help select subsets of three or more. characters until the lower bound found by partitioning is equal to the length of the tree. The tree must then be a minimal tree since no tree can exist with a length less than that of the lower bound.The method is demonstrated using a set of 23 vertebrate cytochrome c sequences with the criterion of minimizing the total number of nucleotide substitutions. There are 131130 7045768798 9603440625 topologically distinct trees that can be constructed from this data set. The method described in this paper does identify 144 minimal tree variants. The method is general in the sense that it can be used for other data and other criteria of length. It need not however always be possible to prove a tree minimal but the method will give an upper and lower bound on the length of minimal trees.  相似文献   

13.
A critical assessment of sequencing markers is desirable to ensure that they are appropriate for the specific questions that are to be addressed. This consideration is particularly important where the data set will be used in highly sensitive analyses such as molecular clock studies. However, there is no standard practice for marker assessment. We examined the mitochondrial DNA sequences of a genus of marine molluscs to assess the relative phylogenetic signal of a number of genes using an extension of splits‐based spectral analysis. With a data set of almost 8 kb of DNA sequences from the mitochondrial genome of a lineage of marine molluscs, we compared the phylogenetic information content of six protein coding, two ribosomal DNA, and 12 transfer RNA genes. Split‐support graphs were used to identify which genes contributed a relatively low signal‐to‐noise ratio of phylogenetic information. We found that cox2 and atp8 did not perform well for reconstruction at the within‐genus level for this lineage. Consideration of nested subsets of taxa improved the resolution of relationships among closely related species by reducing the time frame over which evolutionary processes have occurred, allowing a better fit for models of DNA substitution. Through this fine‐tuning of available data it is possible to generate phylogenetic reconstructions of increased robustness, for which there is a greater understanding of the underlying signals in the data. We recommend a suitable mitochondrial DNA fragment and new primers for intergeneric studies of molluscs, and outline a general pipeline for phylogenetic analysis. © 2011 The Linnean Society of London, Biological Journal of the Linnean Society, 2011, 104 , 770–785.  相似文献   

14.
This paper proposes a new method to identify communities in generally weighted complex networks and apply it to phylogenetic analysis. In this case, weights correspond to the similarity indexes among protein sequences, which can be used for network construction so that the network structure can be analyzed to recover phylogenetically useful information from its properties. The analyses discussed here are mainly based on the modular character of protein similarity networks, explored through the Newman-Girvan algorithm, with the help of the neighborhood matrix . The most relevant networks are found when the network topology changes abruptly revealing distinct modules related to the sets of organisms to which the proteins belong. Sound biological information can be retrieved by the computational routines used in the network approach, without using biological assumptions other than those incorporated by BLAST. Usually, all the main bacterial phyla and, in some cases, also some bacterial classes corresponded totally (100%) or to a great extent (>70%) to the modules. We checked for internal consistency in the obtained results, and we scored close to 84% of matches for community pertinence when comparisons between the results were performed. To illustrate how to use the network-based method, we employed data for enzymes involved in the chitin metabolic pathway that are present in more than 100 organisms from an original data set containing 1,695 organisms, downloaded from GenBank on May 19, 2007. A preliminary comparison between the outcomes of the network-based method and the results of methods based on Bayesian, distance, likelihood, and parsimony criteria suggests that the former is as reliable as these commonly used methods. We conclude that the network-based method can be used as a powerful tool for retrieving modularity information from weighted networks, which is useful for phylogenetic analysis.  相似文献   

15.
We study the equilibrium in the use of synonymous codons by eukaryotic organisms and find five equations involving substitution rates that we believe embody the important implications of equilibrium for the process of silent substitution. We then combine these five equations with additional criteria to determine sets of substitution rates applicable to eukaryotic organisms. One method employs the equilibrium equations and a principle of maximum entropy to find the most uniform set of rates consistent with equilibrium. In a second method we combine the equilibrium equations with data on the man-mouse divergence to determine that set of rates that is most neutral yet consistent with both types of data (i.e., equilibrium and divergence data). Simulations show this second method to be quite reliable in spite of significant saturation in the substitution process. We find that when divergence data are included in the calculation of rates, even though these rates are chosen to be as neutral as possible, the strength of selection inferred from the nonuniformity of the rates is approximately doubled. Both sets of rates are applied to estimate the human-mouse divergence time based on several independent subsets of the divergence data consisting of the quartet, C- or T-ending duet, and A- or G-ending duet codon sets. Both rate sets produce patterns of divergence times that are shortest for the quartet data, intermediate for the CT-ending duets, and longest for the AG-ending duets. This indicates that rates of transitions in the duet-codon sets are significantly higher than those in the quartet-codon sets; this effect is especially marked for A----G, the rate of which in duets must be about double that in quartets.  相似文献   

16.
Phylogenetic trees correspond one-to-one to compatible systems of splits and so splits play an important role in theoretical and computational aspects of phylogeny. Whereas any tree reconstruction method can be thought of as producing a compatible system of splits, an increasing number of phylogenetic algorithms are available that compute split systems that are not necessarily compatible and, thus, cannot always be represented by a tree. Such methods include the split decomposition, Neighbor-Net, consensus networks, and the Z-closure method. A more general split system of this kind can be represented graphically by a so-called splits graph, which generalizes the concept of a phylogenetic tree. This paper addresses the problem of computing a splits graph for a given set of splits. We have implemented all presented algorithms in a new program called SplitsTree4.  相似文献   

17.
MOTIVATION: Accurate time series for biological processes are difficult to estimate due to problems of synchronization, temporal sampling and rate heterogeneity. Methods are needed that can utilize multi-dimensional data, such as those resulting from DNA microarray experiments, in order to reconstruct time series from unordered or poorly ordered sets of observations. RESULTS: We present a set of algorithms for estimating temporal orderings from unordered sets of sample elements. The techniques we describe are based on modifications of a minimum-spanning tree calculated from a weighted, undirected graph. We demonstrate the efficacy of our approach by applying these techniques to an artificial data set as well as several gene expression data sets derived from DNA microarray experiments. In addition to estimating orderings, the techniques we describe also provide useful heuristics for assessing relevant properties of sample datasets such as noise and sampling intensity, and we show how a data structure called a PQ-tree can be used to represent uncertainty in a reconstructed ordering. AVAILABILITY: Academic implementations of the ordering algorithms are available as source code (in the programming language Python) on our web site, along with documentation on their use. The artificial 'jelly roll' data set upon which the algorithm was tested is also available from this web site. The publicly available gene expression data may be found at http://genome-www.stanford.edu/cellcycle/ and http://caulobacter.stanford.edu/CellCycle/.  相似文献   

18.
We analyze the performance of quartet methods in phylogenetic reconstruction. These methods first compute four-taxon trees (4-trees) and then use a combinatorial algorithm to infer a phylogeny that respects the inferred 4-trees as much as possible. Quartet puzzling (QP) is one of the few methods able to take weighting of the 4-trees, which is inferred by maximum likelihood, into account. QP seems to be widely used. We present weight optimization (WO), a new algorithm which is also based on weighted 4-trees. WO is faster and offers better theoretical guarantees than QP. Moreover, computer simulations indicate that the topological accuracy of WO is less dependent on the shape of the correct tree. However, although the performance of WO is better overall than that of QP, it is still less efficient than traditional phylogenetic reconstruction approaches based on pairwise evolutionary distances or maximum likelihood. This is likely related to long-branch attraction, a phenomenon to which quartet methods are very sensitive, and to inappropriate use of the initial results (weights) obtained by maximum likelihood for every quartet.  相似文献   

19.
20.
1IntroductionElbo(ECG)offersalotofilllcorralltinfondionforthediagnosisOfheartdis-eases.Berz1,seahaormalEChcax[lrins>llleu"knownsituations,thcycanbeCSUghtinthelongti1Ylecontin~11xHlltoriDg.ndterrnoultonngsystemwhichcanrecord24-hoUrECGdataisoneofeffectiveme~toprovidethefun~.AlthOUghthehag6scaleICmorestohavebeeddevelopepbedeavailabletostorelOngtboeECGdsta,itisveqdifficultyandtioublesomethatalopnUmbeOfdstaisprasersandstoredortiallsillltted.InthedigitedECGdata,thereare…  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号