首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Classification of biological samples by microarrays is a topic of much interest. A number of methods have been proposed and successfully applied to this problem. It has recently been shown that classification by nearest centroids provides an accurate predictor that may outperform much more complicated methods. The 'Prediction Analysis of Microarrays' (PAM) approach is one such example, which the authors strongly motivate by its simplicity and interpretability. In this spirit, I seek to assess the performance of classifiers simpler than even PAM. RESULTS: I surprisingly show that the modified t-statistics and shrunken centroids employed by PAM tend to increase misclassification error when compared with their simpler counterparts. Based on these observations, I propose a classification method called 'Classification to Nearest Centroids' (ClaNC). ClaNC ranks genes by standard t-statistics, does not shrink centroids and uses a class-specific gene-selection procedure. Because of these modifications, ClaNC is arguably simpler and easier to interpret than PAM, and it can be viewed as a traditional nearest centroid classifier that uses specially selected genes. I demonstrate that ClaNC error rates tend to be significantly less than those for PAM, for a given number of active genes. AVAILABILITY: Point-and-click software is freely available at http://students.washington.edu/adabney/clanc.  相似文献   

2.
3.
4.
ProPred1: prediction of promiscuous MHC Class-I binding sites   总被引:5,自引:0,他引:5  
SUMMARY: ProPred1 is an on-line web tool for the prediction of peptide binding to MHC class-I alleles. This is a matrix-based method that allows the prediction of MHC binding sites in an antigenic sequence for 47 MHC class-I alleles. The server represents MHC binding regions within an antigenic sequence in user-friendly formats. These formats assist user in the identification of promiscuous MHC binders in an antigen sequence that can bind to large number of alleles. ProPred1 also allows the prediction of the standard proteasome and immunoproteasome cleavage sites in an antigenic sequence. This server allows identification of MHC binders, who have the cleavage site at the C terminus. The simultaneous prediction of MHC binders and proteasome cleavage sites in an antigenic sequence leads to the identification of potential T-cell epitopes. AVAILABILITY: Server is available at http://www.imtech.res.in/raghava/propred1/. Mirror site of this server is available at http://bioinformatics.uams.edu/mirror/propred1/ Supplementary information: Matrices and document on server are available at http://www.imtech.res.in/raghava/propred1/page2.html  相似文献   

5.
The Biopolymer Markup Language.   总被引:6,自引:0,他引:6  
SUMMARY: An XML derived from a data model designed to be a hierarchical representation of an organism has been specified and a browser to use this language has been developed. AVAILABILITY: The language definition is available in HTML form at http://www.proteometrics.com/BIOML/. The BioML browser is available on request from the author.  相似文献   

6.
RALEE--RNA ALignment editor in Emacs   总被引:5,自引:0,他引:5  
SUMMARY: Production of high quality multiple sequence alignments of structured RNAs relies on an iterative combination of manual editing and structure prediction. An essential feature of an RNA alignment editor is the facility to mark-up the alignment based on how it matches a given secondary structure prediction, but few available alignment editors offer such a feature. The RALEE (RNA ALignment Editor in Emacs) tool provides a simple environment for RNA multiple sequence alignment editing, including structure-specific colour schemes, utilizing helper applications for structure prediction and many more conventional editing functions. This is accomplished by extending the commonly used text editor, Emacs, which is available for Linux, most UNIX systems, Windows and Mac OS. AVAILABILITY: The ELISP source code for RALEE is freely available from http://www.sanger.ac.uk/Users/sgj/ralee/ along with documentation and examples. CONTACT: sgj@sanger.ac.uk  相似文献   

7.
SUMMARY: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org CONTACT: andreas@sdsc.edu; pbourne@ucsd.edu.  相似文献   

8.
SUMMARY: AliasServer provides services that facilitate the assembly of data or datasets that make use of different identifiers for refering to the same protein. This resource relies on a database which contains, for a given organism, a non-redundant list of protein sequences associated with a set of aliases. AVAILABILITY: AliasServer is available as an interactive Web server at http://cbi.labri.fr/outils/alias/ and as a web service using a SOAP interface. The complete tool, including sources and data, is available for local installations upon request. SUPPLEMENTARY INFORMATION: Technical documentation is available at http://cbi.labri.fr/outils/alias/asdoc.pdf  相似文献   

9.
MOTIVATION: Motif detection is an important component of the classification and annotation of protein sequences. A method for aligning motifs with an amino acid sequence is introduced. The motifs can be described by the secondary (i.e. functional, biophysical, etc.) characteristics of a signal or pattern to be detected. The results produced are based on the statistical relevance of the alignment. The method was targeted to avoid the problems (i.e. over-fitting, biological interpretation and mathematical soundness) encountered in other methods currently available. RESULTS: The method was tested on lipoprotein signals in B. subtilis yielding stable results. The results of signal prediction were consistent with other methods where literature was available. AVAILABILITY: An implementation of the motif alignment, refining and bootstrapping is available for public use online at http://www.expasy.org/tools/patoseq/  相似文献   

10.
XEMBL: distributing EMBL data in XML format   总被引:7,自引:0,他引:7  
Data in the EMBL Nucleotide Sequence Database is traditionally available in a flat file format that has a number of known shortcomings. With XML rapidly emerging as a standard data exchange format that can address some problems of flat file formats by defining data structure and syntax, there is now a demand to distribute EMBL data in an XML format. XEMBL is a service tool that employs CORBA servers to access EMBL data, and distributes the data in XML format via a number of mechanisms. AVAILABILITY: Use of the XEMBL service is free of charge at http://www.ebi.ac.uk/xembl/, and can be accessed via web forms, CGI, and a SOAP-enabled service. SUPPLEMENTARY INFORMATION: Information on the EMBL Nucleotide Sequence Database is available at http://www.ebi.ac.uk/embl/. The EMBL Object Model is available at http://corba.ebi.ac.uk/models/. Information on the EMBL CORBA servers is at http://corba.ebi.ac.uk/  相似文献   

11.
MOTIVATION: Genetic regulatory networks are often affected by stochastic noise, due to the low number of molecules taking part in certain reactions. The networks can be simulated using stochastic techniques that model each reaction as a stochastic event. As models become increasingly large and sophisticated, however, the solution time can become excessive; particularly if one wishes to determine the effect on noise of changes to a series of parameters, or the model structure. Methods are therefore required to rapidly estimate stochastic noise. RESULTS: This paper presents an algorithm, based on error growth techniques from non-linear dynamics, to rapidly estimate the noise characteristics of genetic networks of arbitrary size. The method can also be used to determine analytical solutions for simple sub-systems. It is demonstrated on a number of cases, including a prototype model of the galactose regulatory pathway in yeast. AVAILABILITY: A software tool which incorporates the algorithm is available for use as part of the stochastic simulation package Dizzy. It is available for download at http://labs.systemsbiology.net/bolouri/software/Dizzy/ CONTACT: dorrell@systemsbiology.org SUPPLEMENTARY INFORMATION: A conceptual model of the regulatory part of the galactose utilization pathway in yeast, used as an example in the paper, is available at http://labs.systemsbiology.net/bolouri/models/galconcept.dizzy  相似文献   

12.
SUMMARY: OTUbase is an R package designed to facilitate the analysis of operational taxonomic unit (OTU) data and sequence classification (taxonomic) data. Currently there are programs that will cluster sequence data into OTUs and/or classify sequence data into known taxonomies. However, there is a need for software that can take the summarized output of these programs and organize it into easily accessed and manipulated formats. OTUbase provides this structure and organization within R, to allow researchers to easily manipulate the data with the rich library of R packages currently available for additional analysis. AVAILABILITY: OTUbase is an R package available through Bioconductor. It can be found at http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html.  相似文献   

13.
SUMMARY: We present a distributed and fully cross-platform database search program that allows the user to utilize the idle clock cycles of machines to perform large searches using the most sensitive algorithms. For those in an academic or corporate environment with hundreds of idle desktop machines, DSEARCH can deliver a 'free' database search supercomputer. AVAILABILITY: The software is publicly available under the GNU general public licence from http://www.cs.may.ie/distributed CONTACT: tom.naughton@may.ie SUPPLEMENTARY INFORMATION: Full documentation and a user manual is available from http://www.cs.may.ie/distributed.  相似文献   

14.
15.
SUMMARY: Dasty2 is a highly interactive web client integrating protein sequence annotations from currently more than 40 sources, using the distributed annotation system (DAS). AVAILABILITY: Dasty2 is an open source tool freely available under the terms of the Apache License 2.0, publicly available at http://www.ebi.ac.uk/dasty/.  相似文献   

16.
SUMMARY: affylmGUI is a graphical user interface (GUI) to an integrated workflow for Affymetrix microarray data. The user is able to proceed from raw data (CEL files) to QC and pre-processing, and eventually to analysis of differential expression using linear models with empirical Bayes smoothing. Output of the analysis (tables and figures) can be exported to an HTML report. The GUI provides user-friendly access to state-of-the-art methods embodied in the Bioconductor software repository. AVAILABILITY: affylmGUI is an R package freely available from http://www.bioconductor.org. It requires R version 1.9.0 or later and tcl/tk 8.3 or later and has been successfully tested on Windows 2000, Windows XP, Linux (RedHat and Fedora distributions) and Mac OS/X with X11. Further documentation is available at http://bioinf.wehi.edu.au/affylmGUI CONTACT: keith@wehi.edu.au.  相似文献   

17.
SUMMARY: The purpose of this work is to provide the modern molecular geneticist with tools to perform more efficient and more accurate analysis of the genotype data they produce. By using Microsoft Excel macros written in Visual Basic, we can translate genotype data into a form readable by the versatile software 'Arlequin', read the Arlequin output, calculate statistics of linkage disequilibrium, and put the results in a format for viewing with the software 'GOLD'. AVAILABILITY: The software is available by FTP at: ftp://xcsg.iarc.fr/cox/Genotype_Transposer/. SUPPLEMENTARY INFORMATION: Detailed instruction and examples are available at: ftp://xcsg.iarc.fr/cox/Genotype&_Transposer/. Arlequin is available at: http://lgb.unige.ch/arlequin/. GOLD is available at: http://www.well.ox.ac.uk/asthma/GOLD/.  相似文献   

18.
Alignment of molecular networks by integer quadratic programming   总被引:3,自引:0,他引:3  
MOTIVATION: With more and more data on molecular networks (e.g. protein interaction networks, gene regulatory networks and metabolic networks) available, the discovery of conserved patterns or signaling pathways by comparing various kinds of networks among different species or within a species becomes an increasingly important problem. However, most of the conventional approaches either restrict comparative analysis to special structures, such as pathways, or adopt heuristic algorithms due to computational burden. RESULTS: In this article, to find the conserved substructures, we develop an efficient algorithm for aligning molecular networks based on both molecule similarity and architecture similarity, by using integer quadratic programming (IQP). Such an IQP can be relaxed into the corresponding quadratic programming (QP) which almost always ensures an integer solution, thereby making molecular network alignment tractable without any approximation. The proposed framework is very flexible and can be applied to many kinds of molecular networks including weighted and unweighted, directed and undirected networks with or without loops. AVAILABILITY: Matlab code and data are available from http://zhangroup.aporc.org/bioinfo/MNAligner or http://intelligent.eic.osaka-sandai.ac.jp/chenen/software/MNAligner, or upon request from authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
The LAMS™ database stores data on a colony of breeding animals. Forms are hierarchial and show details of internal codes, matings, litters, and offspring. The identifier given to each animal can be subdivided as such. Each form shows an abbreviated list of the related data from the form one level down, and some special fields, when double-clicked, cause the related record to be displayed. The print button allows the user to print the current record and its related records. Other buttons on each form allow the user to amend, delete, find, and add records within certain rules. User-defined lists are created to allow the selection of various characteristics during data entry. The offspring form contains a section where the user can define the label of a comment and/or a text field. This name is then always subsequently available as an option in a list of user-defined labels. Reports are available for tailtipping dates (if applicable) and calculation of genetic ratios. A queryform allows the user to filter the records in the offspring form to the criteria specified, and a display of the actual query submitted is shown. An integrated HELP is available. The LAMS™ database is available at http://www.hgu.mrc.ac.uk/Softdata/Lams Received: 27 April 1998 / Accepted: 9 November 1998  相似文献   

20.
SUMMARY: GenColors is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes, considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. The genome comparison tools determine, for example, best-bidirectional hits, gene conservation, syntenies and gene core sets. Swiss-Prot/TrEMBL hits allow annotations in an effective manner. To further support the annotation base-specific quality data can also be displayed if available. With GenColors dedicated genome browsers containing a group of related genomes can be easily set up and maintained. It has been efficiently used for Borrelia garinii and is currently applied to various ongoing genome projects. AVAILABILITY: Detailed information on GenColors is available at http://gencolors.imb-jena.de. Online usage of GenColors-based genome browsers is the preferred application mode. The system is also available upon request for local installation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号