首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: ReMark is a fully automatic tool for clustering orthologs by combining a Recursive and a Markov clustering (MCL) algorithms. The ReMark detects and recursively clusters ortholog pairs through reciprocal BLAST best hits between multiple genomes running software program (RecursiveClustering.java) in the first step. Then, it employs MCL algorithm to compute the clusters (score matrices generated from the previous step) and refines the clusters by adjusting an inflation factor running software program (MarkovClustering.java). This method has two key features. One utilizes, to get more reliable results, the diagonal scores in the matrix of the initial ortholog clusters. Another clusters orthologs flexibly through being controlled naturally by MCL with a selected inflation factor. Users can therefore select the fitting state of orthologous protein clusters by regulating the inflation factor according to their research interests. AVAILABILITY AND IMPLEMENTATION: Source code for the orthologous protein clustering software is freely available for non-commercial use at http://dasan.sejong.ac.kr/~wikim/notice.html, implemented in Java 1.6 and supported on Windows and Linux.  相似文献   

2.
3.
We present a simple method to detect pathogenicity islands and anomalous gene clusters in bacterial genomes. The method uses iterative discriminant analysis to define genomic regions that deviate most from the rest of the genome in three compositional criteria: G+C content, dinucleotide frequency and codon usage. Using this method, we identify many virulence-related gene islands, e.g. encoding protein secretion systems, adhesins, toxins, and other anomalous gene clusters, such as prophages. The program and the whole dataset, including the catalogs of genes in the detected anomalous segments, are publicly available at http://compbio.sibsnet.org/projects/pai-ida/. This program can be used in searching for virulence-related factors in newly sequenced bacterial genomes.  相似文献   

4.
Mika S  Rost B 《Nucleic acids research》2003,31(13):3789-3791
UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site.  相似文献   

5.
MOTIVATION: More and more often, a gene is epitomized by a large number of sequences in GenBank. This high redundancy makes it very difficult to identify a unique best match for a query sequence from its BLAST results. We developed a novel program UniBLAST that filters out uninformative hits, clusters the redundant hits, groups the hits by LocusLink, and graphically displays the results. We also implemented a scoring function in UniBLAST to assign a unique gene name to a query sequence. UniBLAST significantly increases the efficiency of gene annotation. AVAILABILITY: The program is available at http://south.genomics.org.cn/software/uniblast/index.html CONTACT: uniblast@genomics.org.cn; wei@nexusgenomics.com  相似文献   

6.
SUMMARY: I describe a parallel implementation of Rogers' mismatch algorithm, a method for making inferences about demographic history from DNA sequence data. The program is distributed on clusters of workstations, providing a substantial speedup and low execution times on large numbers of nodes. AVAILABILITY: Source code and documentation are available at http://mombasa.anthro.utah.edu/wooding/ CONTACT: stephen.wooding@anthro.utah.edu  相似文献   

7.
Machaon CVE: cluster validation for gene expression data   总被引:2,自引:0,他引:2  
SUMMARY: This paper presents a cluster validation tool for gene expression data. Machaon CVE (Clustering and Validation Environment) system aims to partition samples or genes into groups characterized by similar expression patterns, and to evaluate the quality of the clusters obtained. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html SUPPLEMENTARY INFORMATION: http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html  相似文献   

8.
MOTIVATION: Clustering microarray gene expression data is a powerful tool for elucidating co-regulatory relationships among genes. Many different clustering techniques have been successfully applied and the results are promising. However, substantial fluctuation contained in microarray data, lack of knowledge on the number of clusters and complex regulatory mechanisms underlying biological systems make the clustering problems tremendously challenging. RESULTS: We devised an improved model-based Bayesian approach to cluster microarray gene expression data. Cluster assignment is carried out by an iterative weighted Chinese restaurant seating scheme such that the optimal number of clusters can be determined simultaneously with cluster assignment. The predictive updating technique was applied to improve the efficiency of the Gibbs sampler. An additional step is added during reassignment to allow genes that display complex correlation relationships such as time-shifted and/or inverted to be clustered together. Analysis done on a real dataset showed that as much as 30% of significant genes clustered in the same group display complex relationships with the consensus pattern of the cluster. Other notable features including automatic handling of missing data, quantitative measures of cluster strength and assignment confidence. Synthetic and real microarray gene expression datasets were analyzed to demonstrate its performance. AVAILABILITY: A computer program named Chinese restaurant cluster (CRC) has been developed based on this algorithm. The program can be downloaded at http://www.sph.umich.edu/csg/qin/CRC/.  相似文献   

9.
10.
The conserved hydrophobic core is an important feature of a family of protein domains. We suggest a procedure for finding and the analysis of conserved hydrophobic cores. The procedure is based on using an original program called CluD (http://monkey.belozersky.msu.ru/CluD/cgi-bin/hftri.pl). Conserved hydrophobic cores of several families including homeodomains and interlock-containing domains are described. Hydrophobic clusters on some protein-DNA and protein-protein interfaces were also analyzed.  相似文献   

11.
Question: Community ecologists are often confronted with multiple possible partitions of a single set of records of species composition and/or abundances from several sites. Different methods of numerical classification produce different results, and the question is which of them, and how many clusters, should be selected for interpretation. We demonstrate a new method for identifying the optimal partition from a series of partitions of the same set of sites, based on number of species with high fidelity to clusters in a partition (faithful species). Methods: The new method, OptimClass, has two variants. OptimClass 1 searches the partition with the maximum number of faithful species across all clusters, while OptimClass 2 searches the partition with the maximum number of clusters that contain at least a preselected minimum number of faithful species. Faithful species are determined based on the P value of the Fisher's exact test, as a measure of fidelity. OptimClass was tested on three vegetation datasets that varied in species richness and internal heterogeneity, using several classification algorithms, resemblance measures and cover transformations. Results: Results from both variants of OptimClass depended on the preselected threshold P value for faithful species: higher P gave higher probability that a partition with more clusters was selected as optimal. Good partitions, in terms of OptimClass criteria, involved flexible beta clustering, and also ordinal clustering. Good partitions were also obtained with TWINSPAN when the required number of clusters was small, or UPGMA when the required number of clusters was large. Poor partitions usually resulted from classifications that used resemblance measures and cover transformations emphasizing differences in species cover; this is not unexpected because OptimClass uses a presence/absence‐based fidelity measure. Conclusions: If the aim of a classification is to obtain clusters rich in faithful species, which can be subsequently used as diagnostic species for identification of community types, OptimClass is a suitable method for simultaneous choice of the optimal classification algorithm and optimal number of clusters. It can be computed in the JUICE program.  相似文献   

12.
MOTIVATION: The program MBBC 2.0 clusters time-course microarray data using a Bayesian product partition model. RESULTS: The Bayesian product partition model in Booth et al. (2007) simultaneously searches for the optimal number of clusters, and assigns cluster memberships based on temporal changes of gene expressions. MBBC 2.0 to makes this method easily available for statisticians and scientists, and is built with three free computer language software packages: Ox, R and C++, taking advantage of the strengths of each language. Within MBBC, the search algorithm is implemented with Ox and resulting graphs are drawn with R. A user-friendly graphical interface is built with C++ to run the Ox and R programs internally. Thus, MBBC users are not required to know how to use Ox, R or C++, but they must be pre-installed. AVAILABILITY: A self-extractable zip file, MBBC20zip.exe, is available at the MBBC webpage www.stat.ufl.edu/~casella/mbbc/, which contains MBBC.exe, source files, and all other related files. The current version works only in the Windows operating system. A free installation program and overview for Ox is available at www.doornik.com. A detailed installation guide for Ox is provided by MBBC, and is accessible without installing Ox. R is available at www.r-project.org/.  相似文献   

13.
DNA replication in higher eukaryotes initiates at thousands of origins according to a spatio-temporal program. The ATR/Chk1 dependent replication checkpoint inhibits the activation of later firing origins. In the Xenopus in vitro system initiations are not sequence dependent and 2-5 origins are grouped in clusters that fire at different times despite a very short S phase. We have shown that the temporal program is stochastic at the level of single origins and replication clusters. It is unclear how the replication checkpoint inhibits late origins but permits origin activation in early clusters. Here, we analyze the role of Chk1 in the replication program in sperm nuclei replicating in Xenopus egg extracts by a combination of experimental and modelling approaches. After Chk1 inhibition or immunodepletion, we observed an increase of the replication extent and fork density in the presence or absence of external stress. However, overexpression of Chk1 in the absence of external replication stress inhibited DNA replication by decreasing fork densities due to lower Cdk2 kinase activity. Thus, Chk1 levels need to be tightly controlled in order to properly regulate the replication program even during normal S phase. DNA combing experiments showed that Chk1 inhibits origins outside, but not inside, already active clusters. Numerical simulations of initiation frequencies in the absence and presence of Chk1 activity are consistent with a global inhibition of origins by Chk1 at the level of clusters but need to be combined with a local repression of Chk1 action close to activated origins to fit our data.  相似文献   

14.
SUMMARY: TREE-PUZZLE is a program package for quartet-based maximum-likelihood phylogenetic analysis (formerly PUZZLE, Strimmer and von Haeseler, Mol. Biol. Evol., 13, 964-969, 1996) that provides methods for reconstruction, comparison, and testing of trees and models on DNA as well as protein sequences. To reduce waiting time for larger datasets the tree reconstruction part of the software has been parallelized using message passing that runs on clusters of workstations as well as parallel computers. AVAILABILITY: http://www.tree-puzzle.de. The program is written in ANSI C. TREE-PUZZLE can be run on UNIX, Windows and Mac systems, including Mac OS X. To run the parallel version of PUZZLE, a Message Passing Interface (MPI) library has to be installed on the system. Free MPI implementations are available on the Web (cf. http://www.lam-mpi.org/mpi/implementations/).  相似文献   

15.
During S-phase of the cell cycle, chromosomal DNA is replicated according to a complex replication timing program, with megabase-sized domains replicating at different times. DNA fibre analysis reveals that clusters of adjacent replication origins fire near-synchronously. Analysis of replicating cells by light microscopy shows that DNA synthesis occurs in discrete foci or factories. The relationship between timing domains, origin clusters and replication foci is currently unclear. Recent work, using a hybrid Xenopus/hamster replication system, has shown that when CDK levels are manipulated during S-phase the activation of replication factories can be uncoupled from progression through the replication timing program. Here, we use data from this hybrid system to investigate potential relationships between timing domains, origin clusters and replication foci. We suggest that each timing domain typically comprises several replicon clusters, which are usually processed sequentially by replication factories. We discuss how replication might be regulated at different levels to create this complex organisation and the potential involvement of CDKs in this process.  相似文献   

16.
Molecular dynamics (MD) calculations have been performed on the aggregation of clusters with up to 128 Y-shaped perfluoroalkylated molecules of the type C10F20[C7H15]2 (Y-A/128) and C10H20[C7F15]2 (Y-B/128) as well as mixed clusters (Y-A/64+Y-B/64) using the AMBER 5 program. The effect of the segregation tendency of the chemically different parts and the influence of the steric repulsion due to the wedge shape of the molecules on the structure formation have been studied. The results have been analyzed by snapshots, radial atom pair distribution functions, orientational correlation functions as well as diffusion coefficients and are compared with the corresponding findings on clusters of alkanes and perfluoroalkanes. Electronic supplementary material to this paper can be obtained by using the Springer LINK server located at http://dx.doi.org/10.1007/s008940020092y.  相似文献   

17.
SEARCHPKS is a software for detection and analysis of polyketide synthase (PKS) domains in a polypeptide sequence. Modular polyketide synthases are unusually large multi-enzymatic multi-domain megasynthases, which are involved in the biosynthesis of pharmaceutically important natural products using an assembly-line mechanism. This program facilitates easy identification of various PKS domains and modules from a given polypeptide sequence. In addition, it also predicts the specificity of the potential acyltransferase domains for various starter and extender precursor units. SEARCHPKS is a user-friendly tool for correlating polyketide chemical structures with the organization of domains and modules in the corresponding modular polyketide synthases. This program also allows the user to extensively analyze and assess the sequence homology of various polyketide synthase domains, thus providing guidelines for carrying out domain and module swapping experiments. SEARCHPKS can also aid in identification of polyketide products made by PKS clusters found in newly sequenced genomes. The computational approach used in SEARCHPKS is based on a comprehensive analysis of various characterized clusters of modular polyketide synthases compiled in PKSDB, a database of modular polyketide synthases. SEARCHPKS can be accessed at http://www.nii.res.in/searchpks.html.  相似文献   

18.
MOTIVATION: Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering algorithms, such as BAPS, STRUCTURE and TESS, individual multilocus genotypes are partitioned over a set of clusters, often using unsupervised approaches that involve stochastic simulation. As a result, replicate cluster analyses of the same data may produce several distinct solutions for estimated cluster membership coefficients, even though the same initial conditions were used. Major differences among clustering solutions have two main sources: (1) 'label switching' of clusters across replicates, caused by the arbitrary way in which clusters in an unsupervised analysis are labeled, and (2) 'genuine multimodality,' truly distinct solutions across replicates. RESULTS: To facilitate the interpretation of population-genetic clustering results, we describe three algorithms for aligning multiple replicate analyses of the same data set. We have implemented these algorithms in the computer program CLUMPP (CLUster Matching and Permutation Program). We illustrate the use of CLUMPP by aligning the cluster membership coefficients from 100 replicate cluster analyses of 600 chickens from 20 different breeds. AVAILABILITY: CLUMPP is freely available at http://rosenberglab.bioinformatics.med.umich.edu/clumpp.html.  相似文献   

19.
20.
Summary CLUSLA, a computer program for the clustering of very large phytosociological data sets is described. It is an elaboration of Janssen's (1975) simple procedure. The essence of the program is the creation of clusters, each starting with one relevé, as the relevés are entered in the program. Each new relevé that is sufficiently distinct from already existing clusters is considered a new cluster. The fusion criterion is the attainment of a certain level of (dis-) similarity between relevé and cluster. Bray and Curtis' dissimilarity measure with presence-absence data was used.The program, written in FORTRAN for an IBM 370–158 system, can deal with practically unlimited numbers of relevés, provided the product of the number of primary clusters and the number of species does not exceed 140.000. We adopted maxima of 100 and 1400 respectively.After the primary clustering round a reallocation is performed. Then a simple table is printed with information on the significance of occurrence of species in clusters according to a chi-square approach. The primary clusters can be treated again with a higher fusion threshold; or approached with more elaborate methods, in our case particularly the TABORD program.The program is demonstrated with a collection of 6072 relevés with 889 species of salt marsh vegetation from the Working-Group for Data-Processing.Contribution from the Working Group for Data-Processing in Phytosociology, International Society for Vegetation Science. Nomenclature follows the Trieste system, which will be published later.The authors are very grateful to Drs. Jan Janssen, Mike Dale, László Orlóci and Mike Austin for their comments on drafts of the program, and to Wil Kortekaas for her help in the interpretation of the tables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号