首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SUMMARY: MaxBench is a web-based system available for evaluating the results of sequence and structure comparison methods, based on the SCOP protein domain classification. The system makes it easy for developers to both compare the overall performance of their methods to standard algorithms and investigate the results of individual comparisons. AVAILABILITY: http://www.sanger.ac.uk/Users/lp1/MaxBench/  相似文献   

2.
In the present contribution we propose two recently developed classification algorithms for the analysis of mass-spectrometric data-the supervised neural gas and the fuzzy-labeled self-organizing map. The algorithms are inherently regularizing, which is recommended, for these spectral data because of its high dimensionality and the sparseness for specific problems. The algorithms are both prototype-based such that the principle of characteristic representants is realized. This leads to an easy interpretation of the generated classifcation model. Further, the fuzzy-labeled self-organizing map is able to process uncertainty in data, and classification results can be obtained as fuzzy decisions. Moreover, this fuzzy classification together with the property of topographic mapping offers the possibility of class similarity detection, which can be used for class visualization. We demonstrate the power of both methods for two exemplary examples: the classification of bacteria (listeria types) and neoplastic and non-neoplastic cell populations in breast cancer tissue sections.  相似文献   

3.
To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.  相似文献   

4.
MOTIVATION: Advances in microscopy technology have led to the creation of high-throughput microscopes that are capable of generating several hundred gigabytes of images in a few days. Analyzing such wealth of data manually is nearly impossible and requires an automated approach. There are at present a number of open-source and commercial software packages that allow the user to apply algorithms of different degrees of sophistication to the images and extract desired metrics. However, the types of metrics that can be extracted are severely limited by the specific image processing algorithms that the application implements, and by the expertise of the user. In most commercial software, code unavailability prevents implementation by the end user of newly developed algorithms better suited for a particular type of imaging assay. While it is possible to implement new algorithms in open-source software, rewiring an image processing application requires a high degree of expertise. To obviate these limitations, we have developed an open-source high-throughput application that allows implementation of different biological assays such as cell tracking or ancestry recording, through the use of small, relatively simple image processing modules connected into sophisticated imaging pipelines. By connecting modules, non-expert users can apply the particular combination of well-established and novel algorithms developed by us and others that are best suited for each individual assay type. In addition, our data exploration and visualization modules make it easy to discover or select specific cell phenotypes from a heterogeneous population. AVAILABILITY: CellAnimation is distributed under the Creative Commons Attribution-NonCommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/). CellAnimationsource code and documentation may be downloaded from www.vanderbilt.edu/viibre/software/documents/CellAnimation.zip. Sample data are available at www.vanderbilt.edu/viibre/software/documents/movies.zip. CONTACT: walter.georgescu@vanderbilt.edu SUPPLEMENTARY INFORMATION: Supplementary data available at Bioinformatics online.  相似文献   

5.
MOTIVATION: The efficiency of bioinformatics programmers can be greatly increased through the provision of ready-made software components that can be rapidly combined, with additional bespoke components where necessary, to create finished programs. The new standard for C++ includes an efficient and easy to use library of generic algorithms and data-structures, designed to facilitate low-level component programming. The extension of this library to include functionality that is specifically useful in compute-intensive tasks in bioinformatics and molecular modelling could provide an effective standard for the design of reusable software components within the biocomputing community. RESULTS: A novel application of generic programming techniques in the form of a library of C++ components called the Bioinformatics Template Library (BTL) is presented. This library will facilitate the rapid development of efficient programs by providing efficient code for many algorithms and data-structures that are commonly used in biocomputing, in a generic form that allows them to be flexibly combined with application specific object-oriented class libraries. AVAILABILITY: The BTL is available free of charge from our web site http://www.cryst.bbk.ac.uk/~classlib/ and the EMBL file server http://www.embl-ebi.ac.uk/FTP/index.html  相似文献   

6.
MOTIVATION: Rapid software prototyping can significantly reduce development times in the field of computational molecular biology and molecular modeling. Biochemical Algorithms Library (BALL) is an application framework in C++ that has been specifically designed for this purpose. RESULTS: BALL provides an extensive set of data structures as well as classes for molecular mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility which results from an object-oriented and generic programming approach distinguishes it from other software packages. BALL is well suited to serve as a public repository for reliable data structures and algorithms. We show in an example that the implementation of complex methods is greatly simplified when using the data structures and functionality provided by BALL.  相似文献   

7.
Sarment is a package of Python modules for easy building and manipulation of sequence segmentations. It provides efficient implementation of usual algorithms for hidden Markov Model computation, as well as for maximal predictive partitioning. Owing to its very large variety of criteria for computing segmentations, Sarment can handle many kinds of models. Because of object-oriented programming, the results of the segmentation are very easy tomanipulate.  相似文献   

8.
MOTIVATION: Although numerous algorithms have been developed for microarray segmentation, extensive comparisons between the algorithms have acquired far less attention. In this study, we evaluate the performance of nine microarray segmentation algorithms. Using both simulated and real microarray experiments, we overcome the challenges in performance evaluation, arising from the lack of ground-truth information. The usage of simulated experiments allows us to analyze the segmentation accuracy on a single pixel level as is commonly done in traditional image processing studies. With real experiments, we indirectly measure the segmentation performance, identify significant differences between the algorithms, and study the characteristics of the resulting gene expression data. RESULTS: Overall, our results show clear differences between the algorithms. The results demonstrate how the segmentation performance depends on the image quality, which algorithms operate on significantly different performance levels, and how the selection of a segmentation algorithm affects the identification of differentially expressed genes. AVAILABILITY: Supplementary results and the microarray images used in this study are available at the companion web site http://www.cs.tut.fi/sgn/csb/spotseg/  相似文献   

9.

Background

Genomic islands play an important role in medical, methylation and biological studies. To explore the region, we propose a CpG islands prediction analysis platform for genome sequence exploration (CpGPAP).

Results

CpGPAP is a web-based application that provides a user-friendly interface for predicting CpG islands in genome sequences or in user input sequences. The prediction algorithms supported in CpGPAP include complementary particle swarm optimization (CPSO), a complementary genetic algorithm (CGA) and other methods (CpGPlot, CpGProD and CpGIS) found in the literature. The CpGPAP platform is easy to use and has three main features (1) selection of the prediction algorithm; (2) graphic visualization of results; and (3) application of related tools and dataset downloads. These features allow the user to easily view CpG island results and download the relevant island data. CpGPAP is freely available at http://bio.kuas.edu.tw/CpGPAP/.

Conclusions

The platform's supported algorithms (CPSO and CGA) provide a higher sensitivity and a higher correlation coefficient when compared to CpGPlot, CpGProD, CpGIS, and CpGcluster over an entire chromosome.  相似文献   

10.
Cotta C  Moscato P 《Bio Systems》2003,72(1-2):75-97
We propose a heuristic approach to hierarchical clustering from distance matrices based on the use of memetic algorithms (MAs). By using MAs to solve some variants of the Minimum Weight Hamiltonian Path Problem on the input matrix, a sequence of the individual elements to be clustered (referred to as patterns) is first obtained. While this problem is also NP-hard, a probably optimal sequence is easy to find with the current advances for this problem and helps to prune the space of possible solutions and/or to guide the search performed by an actual clustering algorithm. This technique has been successfully applied to both a Branch-and-Bound algorithm, and to evolutionary algorithms and MAs. Experimental results are given in the context of phylogenetic inference and in the hierarchical clustering of gene expression data.  相似文献   

11.
Many bioinformatics solutions suffer from the lack of usable interface/platform from which results can be analyzed and visualized. Overcoming this hurdle would allow for more widespread dissemination of bioinformatics algorithms within the biological and medical communities. The algorithms should be accessible without extensive technical support or programming knowledge. Here, we propose a dynamic wizard platform that provides users with a Graphical User Interface (GUI) for most Java bioinformatics library toolkits. The application interface is generated in real-time based on the original source code. This platform lets developers focus on designing algorithms and biologists/physicians on testing hypotheses and analyzing results. AVAILABILITY: The open source code can be downloaded from: http://bcl.med.harvard.edu/proteomics/proj/APBA/.  相似文献   

12.
MOTIVATION: The nearest shrunken centroid (NSC) method has been successfully applied in many DNA-microarray classification problems. The NSC uses 'shrunken' centroids as prototypes for each class and identifies subsets of genes that best characterize each class. Classification is then made to the nearest (shrunken) centroid. The NSC is very easy to implement and very easy to interpret, however, it has drawbacks. RESULTS: We show that the NSC method can be interpreted in the framework of LASSO regression. Based on that, we consider two new methods, adaptive L(infinity)-norm penalized NSC (ALP-NSC) and adaptive hierarchically penalized NSC (AHP-NSC), with two different penalty functions for microarray classification, which improve over the NSC. Unlike the L(1)-norm penalty used in LASSO, the penalty terms that we consider make use of the fact that parameters belonging to one gene should be treated as a natural group. Numerical results indicate that the two new methods tend to remove irrelevant genes more effectively and provide better classification results than the L(1)-norm approach. AVAILABILITY: R code for the ALP-NSC and the AHP-NSC algorithms are available from authors upon request.  相似文献   

13.
Text similarity: an alternative way to search MEDLINE   总被引:1,自引:0,他引:1  
MOTIVATION: The most widely used literature search techniques, such as those offered by NCBI's PubMed system, require significant effort on the part of the searcher, and inexperienced searchers do not use these systems as effectively as experienced users. Improved literature search engines can save researchers time and effort by making it easier to locate the most important and relevant literature. RESULTS: We have created and optimized a new, hybrid search system for Medline that takes natural text as input and then delivers results with high precision and recall. The combination of a fast, low-sensitivity weighted keyword-based first pass algorithm to cast a wide net to gather an initial set of literature, followed by a unique sentence-alignment based similarity algorithm to rank order those results was developed that is sensitive, fast and easy to use. Several text similarity search algorithms, both standard and novel, were implemented and tested in order to determine which obtained the best results in information retrieval exercises. AVAILABILITY: Literature searching algorithms are implemented in a system called eTBLAST, freely accessible over the web at http://invention.swmed.edu. A variety of other derivative systems and visualization tools provides the user with an enhanced experience and additional capabilities. CONTACT: Harold.Garner@UTSouthwestern.edu.  相似文献   

14.
This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.  相似文献   

15.
16.
MOTIVATION: Two proteins can have a similar 3-dimensional structure and biological function, but have sequences sufficiently different that traditional protein sequence comparison algorithms do not identify their relationship. The desire to identify such relations has led to the development of more sensitive sequence alignment strategies. One such strategy is the Intermediate Sequence Search (ISS), which connects two proteins through one or more intermediate sequences. In its brute-force implementation, ISS is a strategy that repetitively uses the results of the previous query as new search seeds, making it time-consuming and difficult to analyze. RESULTS: Saturated BLAST is a package that performs ISS in an efficient and automated manner. It was developed using Perl and Perl/Tk and implemented on the LINUX operating system. Starting with a protein sequence, Saturated BLAST runs a BLAST search and identifies representative sequences for the next generation of searches. The procedure is run until convergence or until some predefined criteria are met. Saturated BLAST has a friendly graphic user interface, a built-in BLAST result parser, several multiple alignment tools, clustering algorithms and various filters for the elimination of false positives, thereby providing an easy way to edit, visualize, analyze, monitor and control the search. Besides detecting remote homologies, Saturated BLAST can be used to maintain protein family databases and to search for new genes in genomic databases.  相似文献   

17.
MOTIVATION: Due to recent interest in the use of textual material to augment traditional experiments it has become necessary to automatically cluster, classify and filter natural language information. RESULTS: The Simple and Robust Abbreviation Dictionary (SaRAD) provides an easy to implement, high performance tool for the construction of a biomedical symbol dictionary. The algorithms, applied to the MEDLINE document set, result in a high quality dictionary and toolset to disambiguate abbreviation symbols automatically.  相似文献   

18.
Genesis: cluster analysis of microarray data   总被引:26,自引:0,他引:26  
  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号