首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Functional annotation from predicted protein interaction networks   总被引:1,自引:0,他引:1  
MOTIVATION: Progress in large-scale experimental determination of protein-protein interaction networks for several organisms has resulted in innovative methods of functional inference based on network connectivity. However, the amount of effort and resources required for the elucidation of experimental protein interaction networks is prohibitive. Previously we, and others, have developed techniques to predict protein interactions for novel genomes using computational methods and data generated from other genomes. RESULTS: We evaluated the performance of a network-based functional annotation method that makes use of our predicted protein interaction networks. We show that this approach performs equally well on experimentally derived and predicted interaction networks, for both manually and computationally assigned annotations. We applied the method to predicted protein interaction networks for over 50 organisms from all domains of life, providing annotations for many previously unannotated proteins and verifying existing low-confidence annotations. AVAILABILITY: Functional predictions for over 50 organisms are available at http://bioverse.compbio.washington.edu and datasets used for analysis at http://data.compbio.washington.edu/misc/downloads/nannotation_data/. SUPPLEMENTARY INFORMATION: A supplemental appendix gives additional details not in the main text. (http://data.compbio.washington.edu/misc/downloads/nannotation_data/supplement.pdf).  相似文献   

2.
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip.  相似文献   

3.
Model-based clustering and data transformations for gene expression data.   总被引:20,自引:0,他引:20  
MOTIVATION: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a 'good' clustering method and determining the 'correct' number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications. RESULTS: We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has superior performance on our synthetic data sets, consistently selecting the correct model and the number of clusters. On real expression data, the model-based approach produced clusters of quality comparable to a leading heuristic clustering algorithm, but with the key advantage of suggesting the number of clusters and an appropriate model. We also explored the validity of the Gaussian mixture assumption on different transformations of real data. We also assessed the degree to which these real gene expression data sets fit multivariate Gaussian distributions both before and after subjecting them to commonly used data transformations. Suitably chosen transformations seem to result in reasonable fits. AVAILABILITY: MCLUST is available at http://www.stat.washington.edu/fraley/mclust. The software for the diagonal model is under development. CONTACT: kayee@cs.washington.edu. SUPPLEMENTARY INFORMATION: http://www.cs.washington.edu/homes/kayee/model.  相似文献   

4.
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido.  相似文献   

5.
SUMMARY: ClaNC (classification to nearest centroids) is a simple and an accurate method for classifying microarrays. This document introduces a point-and-click interface to the ClaNC methodology. The software is available as an R package. AVAILABILITY: ClaNC is freely available from http://students.washington.edu/adabney/clanc  相似文献   

6.
SUMMARY: We have developed Look-Align, an interactive web-based viewer to display pre-computed multiple sequence alignments. Although initially developed to support the visualization needs of the maize diversity website Panzea (http://www.panzea.org), the viewer is a generic stand-alone tool that can be easily integrated into other websites. AVAILABILITY: Look-Align is written in Perl using open-source components and is available under an open-source license. Live installation and download information can be found at the Panzea website (http://www.panzea.org/software/alignment_viewer.html). CONTACT: ware@cshl.edu SUPPLEMENTARY INFORMATION: The Supplementary information includes sample lists of multiple sequence alignment software and sample screenshots of the viewer.  相似文献   

7.
NetSeed is a web tool and Perl module for analyzing the topology of metabolic networks and calculating the set of exogenously acquired compounds. NetSeed is based on the seed detection algorithm, developed and validated in previous studies. AVAILABILITY: The NetSeed web-based tool, open-source Perl module, examples and documentation are freely available online at: http://depts.washington.edu/elbogs/NetSeed.  相似文献   

8.
9.
Functional metagenomic analyses commonly involve a normalization step, where measured levels of genes or pathways are converted into relative abundances. Here, we demonstrate that this normalization scheme introduces marked biases both across and within human microbiome samples, and identify sample- and gene-specific properties that contribute to these biases. We introduce an alternative normalization paradigm, MUSiCC, which combines universal single-copy genes with machine learning methods to correct these biases and to obtain an accurate and biologically meaningful measure of gene abundances. Finally, we demonstrate that MUSiCC significantly improves downstream discovery of functional shifts in the microbiome.MUSiCC is available at http://elbo.gs.washington.edu/software.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0610-8) contains supplementary material, which is available to authorized users.  相似文献   

10.
We present a Markov chain Monte Carlo coalescent genealogy sampler, LAMARC 2.0, which estimates population genetic parameters from genetic data. LAMARC can co-estimate subpopulation Theta = 4N(e)mu, immigration rates, subpopulation exponential growth rates and overall recombination rate, or a user-specified subset of these parameters. It can perform either maximum-likelihood or Bayesian analysis, and accomodates nucleotide sequence, SNP, microsatellite or elecrophoretic data, with resolved or unresolved haplotypes. It is available as portable source code and executables for all three major platforms. AVAILABILITY: LAMARC 2.0 is freely available at http://evolution.gs.washington.edu/lamarc  相似文献   

11.
Summary: We present a large-scale implementation of the RANKPROPprotein homology ranking algorithm in the form of an openlyaccessible web server. We use the NRDB40 PSI-BLAST all-versus-allprotein similarity network of 1.1 million proteins to constructthe graph for the RANKPROP algorithm, whereas previously, resultswere only reported for a database of 108 000 proteins. We alsodescribe two algorithmic improvements to the original algorithm,including propagation from multiple homologs of the query andbetter normalization of ranking scores, that lead to higheraccuracy and to scores with a probabilistic interpretation. Availability: The RANKPROP web server and source code are availableat http://rankprop.gs.washington.edu Contact: iain{at}nec-labs.com; noble{at}gs.washington.edu Associate Editor: Burkhard Rost  相似文献   

12.
In this report, we compare and contrast three previously published Bayesian methods for inferring haplotypes from genotype data in a population sample. We review the methods, emphasizing the differences between them in terms of both the models ("priors") they use and the computational strategies they employ. We introduce a new algorithm that combines the modeling strategy of one method with the computational strategies of another. In comparisons using real and simulated data, this new algorithm outperforms all three existing methods. The new algorithm is included in the software package PHASE, version 2.0, available online (http://www.stat.washington.edu/stephens/software.html).  相似文献   

13.
ViroBLAST is a stand-alone BLAST web interface for nucleotide and amino acid sequence similarity searches. It extends the utility of BLAST to query against multiple sequence databases and user sequence datasets, and provides a friendly output to easily parse and navigate BLAST results. ViroBLAST is readily useful for all research areas that require BLAST functions and is available online and as a downloadable archive for independent installation. Availability: http://indra.mullins.microbiol.washington.edu/blast/viroblast.php.  相似文献   

14.
Unsupervised segmentation of continuous genomic data   总被引:2,自引:0,他引:2  
The advent of high-density, high-volume genomic data has created the need for tools to summarize large datasets at multiple scales. HMMSeg is a command-line utility for the scale-specific segmentation of continuous genomic data using hidden Markov models (HMMs). Scale specificity is achieved by an optional wavelet-based smoothing operation. HMMSeg is capable of handling multiple datasets simultaneously, rendering it ideal for integrative analysis of expression, phylogenetic and functional genomic data. AVAILABILITY: http://noble.gs.washington.edu/proj/hmmseg  相似文献   

15.
Sinha S  Tompa M 《Nucleic acids research》2003,31(13):3586-3588
A fundamental challenge facing biologists is to identify DNA binding sites for unknown regulatory factors, given a collection of genes believed to be coregulated. The program YMF identifies good candidates for such binding sites by searching for statistically overrepresented motifs. More specifically, YMF enumerates all motifs in the search space and is guaranteed to produce those motifs with greatest z-scores. This note describes the YMF web software, available at http://bio.cs.washington.edu/software.html.  相似文献   

16.
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the best conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose, making use of the phylogenetic relationships among the homologous sequences in order to make more accurate predictions. The software is called FootPrinter and is available at http://bio.cs.washington.edu/software.html.  相似文献   

17.

Background

Host-microbe and microbe-microbe interactions are often governed by the complex exchange of metabolites. Such interactions play a key role in determining the way pathogenic and commensal species impact their host and in the assembly of complex microbial communities. Recently, several studies have demonstrated how such interactions are reflected in the organization of the metabolic networks of the interacting species, and introduced various graph theory-based methods to predict host-microbe and microbe-microbe interactions directly from network topology. Using these methods, such studies have revealed evolutionary and ecological processes that shape species interactions and community assembly, highlighting the potential of this reverse-ecology research paradigm.

Results

NetCooperate is a web-based tool and a software package for determining host-microbe and microbe-microbe cooperative potential. It specifically calculates two previously developed and validated metrics for species interaction: the Biosynthetic Support Score which quantifies the ability of a host species to supply the nutritional requirements of a parasitic or a commensal species, and the Metabolic Complementarity Index which quantifies the complementarity of a pair of microbial organisms’ niches. NetCooperate takes as input a pair of metabolic networks, and returns the pairwise metrics as well as a list of potential syntrophic metabolic compounds.

Conclusions

The Biosynthetic Support Score and Metabolic Complementarity Index provide insight into host-microbe and microbe-microbe metabolic interactions. NetCooperate determines these interaction indices from metabolic network topology, and can be used for small- or large-scale analyses. NetCooperate is provided as both a web-based tool and an open-source Python module; both are freely available online at http://elbo.gs.washington.edu/software_netcooperate.html.  相似文献   

18.
Microarray profiling of gene expression is a powerful tool for discovery, but the ability to manage and compare the resulting data can be problematic. Biological, experimental, and technical variations between studies of the same phenotype/phenomena create substantial differences in results. The application of conventional meta-analysis to raw microarray data is complicated by differences in the type of microarray used, gene nomenclatures, species, and analytical methods. An alternative approach to combining multiple microarray studies is to compare the published gene lists which result from the investigators' analyses of the raw data, as implemented in Lists of Lists Annotated (LOLA: www.lola.gwu.edu) and L2L (depts.washington.edu/l2l/). The present review considers both the potential value and the limitations of databasing and enabling the comparison of results from different microarray studies. Further, a major impediment to cross-study comparisons is the absence of a standard for reporting microarray study results. We propose a reporting standard: standard microarray results template (SMART), which will facilitate the integration of microarray studies.  相似文献   

19.
Functional annotation is routinely performed for large-scale genomics projects and databases. Researchers working on more specific problems, for instance on an individual pathway or complex, also need to be able to quickly, completely and accurately annotate sequences. The Bioverse sequence annotation server (http://bioverse.compbio.washington.edu) provides a web-based interface to allow users to submit protein sequences to the Bioverse framework. Sequences are functionally and structurally annotated and potential contextual annotations are provided. Researchers can also submit candidate genomes for annotation of all proteins encoded by the genome (proteome).  相似文献   

20.
MOTIVATION: This work aims to develop computational methods to annotate protein structures in an automated fashion. We employ a support vector machine (SVM) classifier to map from a given class of structures to their corresponding structural (SCOP) or functional (Gene Ontology) annotation. In particular, we build upon recent work describing various kernels for protein structures, where a kernel is a similarity function that the classifier uses to compare pairs of structures. RESULTS: We describe a kernel that is derived in a straightforward fashion from an existing structural alignment program, MAMMOTH. We find in our benchmark experiments that this kernel significantly out-performs a variety of other kernels, including several previously described kernels. Furthermore, in both benchmarks, classifying structures using MAMMOTH alone does not work as well as using an SVM with the MAMMOTH kernel. AVAILABILITY: http://noble.gs.washington.edu/proj/3dkernel  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号