首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Phylogenetic analyses today involve dealing with computer files in different formats and often several computer programs. Although some widely used applications have integrated important functionalities for such analyses, they still work with local resources only: input/output files (users have to manage them) and local computing (users have sometimes to leave their programs, on their desktop computers, running for extended periods of time). To address these problems we have developed 'Bosque', a multi-platform client-server software that performs standard phylogenetic tasks either locally or remotely on servers, and integrates the results on a local relational database. Bosque performs sequence alignments and graphical visualization and editing of trees, thus providing a powerful environment that integrates all the steps of phylogenetic analyses. AVAILABILITY: http://bosque.udec.cl  相似文献   

2.
Estimating Phylogenies of Species (EPoS) is a modular software framework for phylogenetic analysis, visualization and data management. It provides a plugin-based system that integrates a storage facility, a rich user interface and the ability to easily incorporate new methods, functions and visualizations. EPoS ships with persistent data management, a set of well-known phylogenetic algorithms and a multitude of tree visualization methods and layouts. Implemented algorithms cover distance-based tree construction, consensus trees and various graph-based supertree methods. The rendering system can be customized for, say, different edge and node styles.  相似文献   

3.
MOTIVATION: The increasing availability of phylogenetic and trait data for communities of co-occurring species has created a need for software that integrates ecological and evolutionary analyses. Capabilities: Phylocom calculates numerous metrics of phylogenetic community structure and trait similarity within communities. Hypothesis testing is implemented using several null models. Within the same framework, it measures phylogenetic signal and correlated evolution for species traits. A range of utility functions allow community and phylogenetic data manipulation, tree and trait generation, and integration into scientific workflows. Availability: Open source at: http://phylodiversity.net/phylocom/.  相似文献   

4.

Background  

Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.  相似文献   

5.
6.
Summary: The Summary Tree Explorer (STE) is a Java applicationfor interactively exploring sets of phylogenetic trees usingtwo coupled representations: a node-and-link diagram and a textuallist of common clades. Selection, pruning, filtering or re-rootingin one representation is immediately reflected in the other.While summary trees are more effective at showing the relationshipamong clades, they can only show a consistent subset of thosethat appear in the textual list. Working with both representationsmitigates the disadvantages of having to choose just one. Availability: STE, along with several sample datasets, is availableat http://cityscape.inf.cs.cmu.edu/phylogeny/ Contact: mad{at}cs.cmu.edu Associate Editor: Martin Bishop  相似文献   

7.
Applications of DNA tiling arrays for whole-genome analysis   总被引:26,自引:0,他引:26  
  相似文献   

8.
A total of 37 complete genome sequences of bacteria, archaea, and eukaryotes were compared. The percentage of orthologous genes of each species contained within any of the other 36 genomes was established. In addition, the mean identity of the orthologs was calculated. Several conclusions result: (i) a greater absolute number of orthologs of a given species is found in larger species than in smaller ones; (ii) a greater percentage of the orthologous genes of smaller genomes is contained in other species than is the case for larger genomes, which corresponds to a larger proportion of essential genes; (iii) before species can be specifically related to one another in terms of gene content, it is first necessary to correct for the size of the genome; (iv) eukaryotes have a significantly smaller percentage of bacterial orthologs after correction for genome size, which is consistent with their placement in a separate domain; (v) the archaebacteria are specifically related to one another but are not significantly different in gene content from the bacteria as a whole; (vi) determination of the mean identity of all orthologs (involving hundreds of gene comparisons per genome pair) reduces the impact of errors in misidentification of orthologs and to misalignments, and thus it is far more reliable than single gene comparisons; (vii) however, there is a maximum amount of change in protein sequences of 37% mean identity, which limits the use of percentage sequence identity to the lower taxa, a result which should also be true for single gene comparisons of both proteins and rRNA; (viii) most of the species that appear to be specifically related based upon gene content also appear to be specifically related based upon the mean identity of orthologs; (ix) the genes of a majority of species considered in this study have diverged too much to allow the construction of all-encompassing evolutionary trees. However, we have shown that eight species of gram-negative bacteria, six species of gram-positive bacteria, and eight species of archaebacteria are specifically related in terms of gene content, mean identity of orthologs, or both.  相似文献   

9.
10.
The sequencing and analysis of multiple housekeeping genes has been routinely used to phylogenetically compare closely related bacterial isolates. Recent studies using whole-genome alignment (WGA) and phylogenetics from >100 Escherichia coli genomes has demonstrated that tree topologies from WGA and multilocus sequence typing (MLST) markers differ significantly. A nonrepresentative phylogeny can lead to incorrect conclusions regarding important evolutionary relationships. In this study, the Phylomark algorithm was developed to identify a minimal number of useful phylogenetic markers that recapitulate the WGA phylogeny. To test the algorithm, we used a set of diverse draft and complete E. coli genomes. The algorithm identified more than 100,000 potential markers of different fragment lengths (500 to 900 nucleotides). Three molecular markers were ultimately chosen to determine the phylogeny based on a low Robinson-Foulds (RF) distance compared to the WGA phylogeny. A phylogenetic analysis demonstrated that a more representative phylogeny was inferred for a concatenation of these markers compared to all other MLST schemes for E. coli. As a functional test of the algorithm, the three markers (genomic guided E. coli markers, or GIG-EM) were amplified and sequenced from a set of environmental E. coli strains (ECOR collection) and informatically extracted from a set of 78 diarrheagenic E. coli strains (DECA collection). In the instances of the 40-genome test set and the DECA collection, the GIG-EM system outperformed other E. coli MLST systems in terms of recapitulating the WGA phylogeny. This algorithm can be employed to determine the minimal marker set for any organism that has sufficient genome sequencing.  相似文献   

11.
ABSTRACT: BACKGROUND: Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs) is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS) however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. RESULTS: We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS) [17] for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units) technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case-Control Consortium) data. CONCLUSIONS: Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.  相似文献   

12.
An R package for analysis of whole-genome association studies   总被引:3,自引:0,他引:3  
OBJECTIVE: To provide data classes and methods to facilitate the analysis of whole genome association studies in the R language for statistical computing. METHODS: We have implemented data classes in which each genotype call is stored as a single byte. At this density, data for single chromosomes derived from large studies and new high-throughput gene chip platforms can be handled in memory. We use the object-oriented programming model introduced with version 4 of the S-plus package, usually termed 'S4 methods'. RESULTS: At the current state of development the package only supports population-based studies, although we would hope to provide support for family-based studies soon. Both quantitative and qualitative phenotypes may be analysed. Flexible association testing functions are provided which can carry out single SNP tests which control for potential confounding by quantitative and qualitative covariates. Tests involving several SNPs taken together as 'tags' are also supported. Efficient calculation of pair-wise linkage disequilibrium measures is implemented and data input functions include a function which can download data directly from the international HapMap project website.  相似文献   

13.
TOPD/FMTS: a new software to compare phylogenetic trees   总被引:1,自引:0,他引:1  
SUMMARY: TOPD/FMTS has been developed to evaluate similarities and differences between phylogenetic trees. The software implements several new algorithms (including the Disagree method that returns the taxa, that disagree between two trees and the Nodal method that compares two trees using nodal information) and several previously described methods (such as the Partition method, Triplets or Quartets) to compare phylogenetic trees. One of the novelties of this software is that the FMTS (From Multiple to Single) program allows the comparison of trees that contain both orthologs and paralogs. Each option is also complemented with a randomization analysis to test the null hypothesis that the similarity between two trees is not better than chance expectation. AVAILABILITY: The Perl source code of TOPD/FMTS is available at http://genomes.urv.es/topd.  相似文献   

14.
Summary: PCCA (phylogenetic canonical correlation analysis)is a new program for canonical correlation analysis of multivariate,continuously valued data from biological species. Canonicalcorrelation analysis is a technique in which derived variablesare obtained from two sets of original variables whereby thecorrelations between corresponding derived variables are maximized.It is a very useful multivariate statistical method for thecalculation and analysis of correlations between character sets.The program controls for species non-independence due to phylogenetichistory and computes canonical coefficients, correlations andscores; and conducts hypothesis tests on the canonical correlations.It can also compute a multivariate version of Pagel's , whichcan then be used in the phylogenetic transformation. Availability: PCCA is distributed as DOS/Windows, Mac OS X andLinux/Unix executables with a detailed program manual and isfreely available on the World Wide Web at: http://anolis.oeb.harvard.edu/~liam/programs/. Contact: lrevell{at}fas.harvard.edu Associate Editor: Keith Crandall  相似文献   

15.
Most bioinformatics tools require specialized input formats for sequence comparison and analysis. This is particularly true for molecular phylogeny programs, which accept only certain formats. In addition, it is often necessary to eliminate highly similar sequences among the input, especially when the dataset is large. Moreover, most programs have restrictions upon the sequence name. Here we introduce SeqMaT, a Sequence Manipulation Tool. It has the following functions: data format conversion,sequence name coding and decoding,redundant and highly similar sequence removal, anddata mining utilities. SeqMaT was developed using Java with two versions, web-based and standalone. A standalone program is convenient to manipulate a large number of sequences, while the web version will guarantee wide availability of the tool for researchers and practitioners throughout the Internet. AVAILABILITY: The database is available for free at http://glee.ist.unomaha.edu/seqmat.  相似文献   

16.
SUMMARY: Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output file in PHYLIP or NEXUS formats. AVAILABILITY: The computer program, including supporting files and example files, is available from http://www.dalicon.com/chimera/.  相似文献   

17.
Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 ×.  相似文献   

18.
Yu HT  Ma GC  Lee DJ  Chin SC  Chen TL  Tsao HS  Lin WH  Wu SH  Lin CC  Chen M 《Theriogenology》2012,77(8):1615-1623
The objective was to apply a novel modification of a genome-wide, comparative cytogenetic technique (comparative genomic hybridization, comparative genomic hybridization (CGH)), to study species belonging to the myrmecophagous (ant/termite eating) mammalian orders/superorders (Pholidota, Tubulidentata, Carnivora, and Xenarthra), as a model for other applications in mammalian systematics and conservation biology. In this study, CGH was applied to high-quality metaphase spreads of pangolin (Pholidota), using probes of sloth and canine (Xenarthra and Carnivora, respectively) genomic DNA labeled with different fluorophores, thereby facilitating analysis of the visible color spectrum on pangolin karyotypes. Our results posited that pholidotes are closer to carnivores than to xenarthrans, which confirmed the current consensus that myrmecophagy in these mammalian lineages was more likely because of homoplasy (convergent evolution) than being an ancestral character. Since the modified CGH technique used is genome-wide, has chromosome-level resolution, and does not need full genome sequencing, it has considerable potential in systematics and other fields.  相似文献   

19.
The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号