共查询到20条相似文献,搜索用时 8 毫秒
1.
A method is described that allows the assessment of treelikeness of phylogenetic distance data before tree estimation. This method is related to statistical geometry as introduced by Eigen, Winkler-Oswatitsch, and Dress (1988 [Proc. Natl. Acad. Sci. USA. 85:5913-5917]), and in essence, displays a measure for treelikeness of quartets in terms of a histogram that we call a delta plot. This allows identification of nontreelike data and analysis of noisy data sets arising from processes such as, for example, parallel evolution, recombination, or lateral gene transfer. In addition to an overall assessment of treelikeness, individual taxa can be ranked by reference to the treelikeness of the quartets to which they belong. Removal of taxa on the basis of this ranking results in an increase in accuracy of tree estimation. Recombinant data sets are simulated, and the method is shown to be capable of identifying single recombinant taxa on the basis of distance information alone, provided the parents of the recombinant sequence are sufficiently divergent and the mixture of tree histories is not strongly skewed toward a single tree. delta Plots and taxon rankings are applied to three biological data sets using distances derived from sequence alignment, gene order, and fragment length polymorphism. 相似文献
2.
3.
MOTIVATION: Fluorescence in situ hybridization (FISH) is used to study the organization and the positioning of specific DNA sequences within the cell nucleus. Analyzing the data from FISH images is a tedious process that invokes an element of subjectivity. Automated FISH image analysis offers savings in time as well as gaining the benefit of objective data analysis. While several FISH image analysis software tools have been developed, they often use a threshold-based segmentation algorithm for nucleus segmentation. As fluorescence signal intensities can vary significantly from experiment to experiment, from cell to cell, and within a cell, threshold-based segmentation is inflexible and often insufficient for automatic image analysis, leading to additional manual segmentation and potential subjective bias. To overcome these problems, we developed a graphical software tool called FISH Finder to automatically analyze FISH images that vary significantly. By posing the nucleus segmentation as a classification problem, compound Bayesian classifier is employed so that contextual information is utilized, resulting in reliable classification and boundary extraction. This makes it possible to analyze FISH images efficiently and objectively without adjustment of input parameters. Additionally, FISH Finder was designed to analyze the distances between differentially stained FISH probes. AVAILABILITY: FISH Finder is a standalone MATLAB application and platform independent software. The program is freely available from: http://code.google.com/p/fishfinder/downloads/list. 相似文献
4.
Michael W. Hart Sheri L. Johnson Jason A. Addison Maria Byrne 《Invertebrate Biology》2004,123(4):343-356
Abstract. Historically, characters from early animal development have been a potentially rich source of phylogenetic information, but many traits associated with the gametes and larval stages of animals with complex life cycles are widely suspected to have evolved frequent convergent similarities. Such convergences will confound true phylogenetic relationships. We compared phylogenetic inferences based on early life history traits with those from mitochondrial DNA sequences for sea stars in the genera Asterina, Cryptasterina , and Patiriella (Valvatida: Asterinidae). Analysis of these two character sets produced phylogenies that shared few clades. We quantified the degree of homoplasy in each character set when mapped onto the phylogeny inferred from the alternative characters. The incongruence between early life history and nucleotide characters implies more homoplasy in the life history character set. We suggest that the early life history traits in this case are most likely to be misleading as phylogenetic characters because simple adaptive models predict convergence in early life histories. We show that adding early life history characters may slightly improve a phylogeny based on nucleotide sequences, but adding nucleotide characters may be critically important to improving inferences from phylogenies based on early life history characters. 相似文献
5.
Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. 相似文献
6.
Background
Phylogenetic trees are widely used to visualize evolutionary relationships between different organisms or samples of the same organism. There exists a variety of both free and commercial tree visualization software available, but limitations in these programs often require researchers to use multiple programs for analysis, annotation, and the production of publication-ready images. 相似文献7.
MacT is a set of programs for the Apple Macintosh to constructand evaluate unrooted trees derived from amino acid sequencesusing a distance matrix method. Programs are designed on a oneprogramone task basis for (i) determining thebranching order in trees consisting of four or five speciesand calculating various statistical measures, (ii) calculatingstatistical measures for all possible topologies of unrootedtrees and (iii) generating and evaluating trees derived frombootstrapped samples. With four auxiliary programs unrootedtrees can be built for maximal 26 species, and the robustnessof topologies be tested by bootstrapping. 相似文献
8.
Dynamic models of biochemical networks usually are described as a system of nonlinear differential equations. In case of optimization of models for purpose of parameter estimation or design of new properties mainly numerical methods are used. That causes problems of optimization predictability as most of numerical optimization methods have stochastic properties and the convergence of the objective function to the global optimum is hardly predictable. Determination of suitable optimization method and necessary duration of optimization becomes critical in case of evaluation of high number of combinations of adjustable parameters or in case of large dynamic models. This task is complex due to variety of optimization methods, software tools and nonlinearity features of models in different parameter spaces. A software tool ConvAn is developed to analyze statistical properties of convergence dynamics for optimization runs with particular optimization method, model, software tool, set of optimization method parameters and number of adjustable parameters of the model. The convergence curves can be normalized automatically to enable comparison of different methods and models in the same scale. By the help of the biochemistry adapted graphical user interface of ConvAn it is possible to compare different optimization methods in terms of ability to find the global optima or values close to that as well as the necessary computational time to reach them. It is possible to estimate the optimization performance for different number of adjustable parameters. The functionality of ConvAn enables statistical assessment of necessary optimization time depending on the necessary optimization accuracy. Optimization methods, which are not suitable for a particular optimization task, can be rejected if they have poor repeatability or convergence properties. The software ConvAn is freely available on www.biosystems.lv/convan. 相似文献
9.
Abouheif adapted a test for serial independence to detect a phylogenetic signal in phenotypic traits. We provide the exact analytic value of this test, revealing that it uses Moran's I statistic with a new matrix of phylogenetic proximities. We introduce then two new matrices of phylogenetic proximities highlighting their mathematical properties: matrix A which is used in Abouheif test and matrix M which is related to A and biodiversity studies. Matrix A unifies the tests developed by Abouheif, Moran and Geary. We discuss the advantages of matrices A and M over three widely used phylogenetic proximity matrices through simulations evaluating power and type-I error of tests for phylogenetic autocorrelation. We conclude that A enhances the power of Moran's test and is useful for unresolved trees. Data sets and routines are freely available in an online package and explained in an online supplementary file. 相似文献
10.
Pairwise comparison of long stretches of genomic DNA sequence can identify regions conserved across species, which often indicate functional significance. However, the novel insights frequently must be windowed from a flood of information; for instance, running an alignment program on two 50-kilobase sequences might yield over a hundred pages of alignments. Direct inspection of such a volume of printed output is infeasible, or at best highly undesirable, and computer tools are needed to summarize the information, to assist in its analysis, and to report the findings. This paper describes two such software tools. One tool prepares publication-quality pictorial representations of alignments, while another facilitates interactive browsing of pairwise alignment data. Their effectiveness is illustrated by comparing the beta-like globin gene clusters between humans and rabbits. A second example compares the chloroplast genomes of tobacco and liverwort. 相似文献
11.
Arjun B. Prasad James C. Mullikin Eric D. Green 《Molecular phylogenetics and evolution》2013,66(3):1067-1074
Analyses of DNA sequence datasets have repeatedly revealed inconsistencies in phylogenetic trees derived with different data. This is termed phylogenetic incongruence, and may arise from a methodological failure of the inference process or from biological processes, such as horizontal gene transfer, incomplete lineage sorting, and introgression. To better understand patterns of incongruence, we developed a method (PartFinder) that uses likelihood ratios applied to sliding windows for visualizing tree-support changes across genome-sequence alignments, allowing the comparative examination of complex phylogenetic scenarios among many species. As a pilot, we used PartFinder to investigate incongruence in the Homo-Pan-Gorilla group as well as Platyrrhini using high-quality bacterial artificial chromosome (BAC)-derived sequences as well as assembled whole-genome shotgun sequences. Our simulations verified the sensitivity of PartFinder, and our results were comparable to other studies of the Homo-Pan-Gorilla group. Analyses of the whole-genome alignments reveal significant associations between support for the accepted species relationship and specific characteristics of the genomic regions, such as GC-content, alignment score, exon content, and conservation. Finally, we analyzed sequence data generated for five platyrrhine species, and found incongruence that suggests a polytomy within Cebidae, in particular. Together, these studies demonstrate the utility of PartFinder for investigating the patterns of phylogenetic incongruence. 相似文献
12.
The problem of testing for congruence between phylogenetic data has long been debated among phylogeneticists, but reaches a critical point with the availability of large amount of biological sequences. Notably in prokaryotes, where the amount of lateral transfers is believed to be important, the inference of phylogenies using multiple genes requires testing for incongruence before concatenating the genes. On another scale, incongruence tests can be used to detect recombination points within single gene alignments. The incongruence length difference test (ILD), based on parsimony, has been proved to be useful for finding incongruent data sets, but its application remains limited to small data sets for computational time reasons. Here, we have adapted the principle of ILD to the BIONJ algorithm. This algorithm is based on a tree length minimisation criterion and is suitable to replace parsimony in this test when used with uncorrected distance (model-free approach). We show that this new test, ILD-BIONJ, while being much faster, is often more accurate than the ILD test, especially when the alignments compared are simulated under different evolutionary models. 相似文献
13.
14.
15.
Zhang Z Xiao J Wu J Zhang H Liu G Wang X Dai L 《Biochemical and biophysical research communications》2012,419(4):779-781
Constructing multiple homologous alignments for protein-coding DNA sequences is crucial for a variety of bioinformatic analyses but remains computationally challenging. With the growing amount of sequence data available and the ongoing efforts largely dependent on protein-coding DNA alignments, there is an increasing demand for a tool that can process a large number of homologous groups and generate multiple protein-coding DNA alignments. Here we present a parallel tool - ParaAT that is capable of parallelly constructing multiple protein-coding DNA alignments for a large number of homologs. As testified on empirical datasets, ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks. ParaAT is freely available for academic use only at http://cbb.big.ac.cn/software. 相似文献
16.
The groupings of taxa in a phylogenetic tree cannot represent all the conflicting signals that usually occur among site patterns in aligned homologous genetic sequences. Hence a tree-building program must compromise by reporting a subset of the patterns, using some discriminatory criterion. Thus, in the worst case, out of possibly a large number of equally good trees, only an arbitrarily chosen tree might be reported by the tree-building program as "The Tree." This tree might then be used as a basis for phylogenetic conclusions. One strategy to represent conflicting patterns in the data is to construct a network. The Buneman graph is a theoretically very attractive example of such a network. In particular, a characterization for when this network will be a tree is known. Also the Buneman graph contains each of the most parsimonious trees indicated by the data. In this paper we describe a new method for constructing the Buneman graph that can be used for a generalization of Hadamard conjugation to networks. This new method differs from previous methods by allowing us to focus on local regions of the graph without having to first construct the full graph. The construction is illustrated by an example. 相似文献
17.
High-throughput sequencing for microRNA (miRNA) profiling has revealed a vast complexity of miRNA processing variants, but these are difficult to discern for those without bioinformatics expertise and large computing capability. In this article, we present miRNA Sequence Profiling (miRspring) (http://mirspring.victorchang.edu.au), a software solution that creates a small portable research document that visualizes, calculates and reports on the complexities of miRNA processing. We designed an index-compression algorithm that allows the miRspring document to reproduce a complete miRNA sequence data set while retaining a small file size (typically <3 MB). Through analysis of 73 public data sets, we demonstrate miRspring’s features in assessing quality parameters, miRNA cluster expression levels and miRNA processing. Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document. Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR. We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology. 相似文献
18.
SUMMARY: LumberJack is a phylogenetic tool intended to serve two purposes: to facilitate sampling treespace to find likely tree topologies quickly, and to map phylogenetic signal onto regions of an alignment in a revealing way. LumberJack creates non-random jackknifed alignments by progressively sliding a window of omission along the alignment. A neighbor-joining tree is built from the full alignment and from each jackknifed alignment, and then the likelihood for each topology (given the original full alignment) is calculated. To determine whether any of the topologies generated is significantly more likely than the others, Kishino-Hasegawa, Shimodaira-Hasegawa and ELW tests are implemented. Availability and SUPPLEMENTARY INFORMATION: http://www.plantbio.uga.edu/~russell/software.html 相似文献
19.
Distance based algorithms are a common technique in the construction of phylogenetic trees from taxonomic sequence data. The first step in the implementation of these algorithms is the calculation of a pairwise distance matrix to give a measure of the evolutionary change between any pair of the extant taxa. A standard technique is to use the log det formula to construct pairwise distances from aligned sequence data. We review a distance measure valid for the most general models, and show how the log det formula can be used as an estimator thereof. We then show that the foundation upon which the log det formula is constructed can be generalized to produce a previously unknown estimator which improves the consistency of the distance matrices constructed from the log det formula. This distance estimator provides a consistent technique for constructing quartets from phylogenetic sequence data under the assumption of the most general Markov model of sequence evolution. 相似文献
20.
Emilio García‐Roselló Cástor Guisande Jacinto González‐Dacosta Juergen Heine Patricia Pelayo‐Villamil Ana Manjarrás‐Hernández Antonio Vaamonde Carlos Granado‐Lorencio 《Ecography》2013,36(11):1202-1207
The ModestR package consists of three applications: MapMaker, DataManager and MRFinder. MapMaker facilitates making range maps by drawing the areas, by importing existing data or using the Global Biodiversity Information Facility portal. It can discriminate between different habitats, thereby making data cleaning tasks easier. DataManager allows the management of taxonomically structured databases for range maps. MRFinder supports querying ModestR databases to find the species present in specific areas. Possible applications include the compilation and management of species distribution databases, cleaning data and computing aggregated data to perform subsequent analyses in other packages thanks to emphasized interoperability. 相似文献