首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PIR: a new resource for bioinformatics   总被引:3,自引:0,他引:3  
SUMMARY: The Protein Information Resource (PIR) has greatly expanded its Web site and developed a set of interactive search and analysis tools to facilitate the analysis, annotation, and functional identification of proteins. New search engines have been implemented to combine sequence similarity search results with database annotation information. The new PIR search systems have proved very useful in providing enriched functional annotation of protein sequences, determining protein superfamily-domain relationships, and detecting annotation errors in genomic database archives. AVAILABILITY: http://pir.georgetown.edu/. CONTACT: mcgarvey@nbrf.georgetown.edu  相似文献   

2.
John Lhota  Lei Xie 《Proteins》2016,84(4):467-472
Protein structure prediction, when construed as a fold recognition problem, is one of the most important applications of similarity search in bioinformatics. A new protein‐fold recognition method is reported which combines a single‐source K diverse shortest path (SSKDSP) algorithm with Enrichment of Network Topological Similarity (ENTS) algorithm to search a graphic feature space generated using sequence similarity and structural similarity metrics. A modified, more efficient SSKDSP algorithm is developed to improve the performance of graph searching. The new implementation of the SSKDSP algorithm empirically requires 82% less memory and 61% less time than the current implementation, allowing for the analysis of larger, denser graphs. Furthermore, the statistical significance of fold ranking generated from SSKDSP is assessed using ENTS. The reported ENTS‐SSKDSP algorithm outperforms original ENTS that uses random walk with restart for the graph search as well as other state‐of‐the‐art protein structure prediction algorithms HHSearch and Sparks‐X, as evaluated by a benchmark of 600 query proteins. The reported methods may easily be extended to other similarity search problems in bioinformatics and chemoinformatics. The SSKDSP software is available at http://compsci.hunter.cuny.edu/~leixie/sskdsp.html . Proteins 2016; 84:467–472. © 2016 Wiley Periodicals, Inc.  相似文献   

3.
Similarity search for protein 3D structures become complex and computationally expensive due to the fact that the size of protein structure databases continues to grow tremendously. Recently, fast structural similarity search systems have been required to put them into practical use in protein structure classification whilst existing comparison systems do not provide comparison results on time. Our approach uses multi-step processing that composes of a preprocessing step to represent geometry of protein structures with spatial objects, a filter step to generate a small candidate set using approximate topological string matching, and a refinement step to compute a structural alignment. This paper describes the preprocessing and filtering for fast similarity search using the discovery of topological patterns of secondary structure elements based on spatial relations. Our system is fully implemented by using Oracle 8i spatial. We have previously shown that our approach has the advantage of speed of performance compared with other approach such as DALI. This work shows that the discovery of topological relations of secondary structure elements in protein structures by using spatial relations of spatial databases is practical for fast structural similarity search for proteins.  相似文献   

4.
MOTIVATION: Word-matching algorithms such as BLAST are routinely used for sequence comparison. These algorithms typically use areas of matching words to seed alignments which are then used to assess the degree of sequence similarity. In this paper, we show that by formally separating the word-matching and sequence-alignment process, and using information about word frequencies to generate alignments and similarity scores, we can create a new sequence-comparison algorithm which is both fast and sensitive. The formal split between word searching and alignment allows users to select an appropriate alignment method without affecting the underlying similarity search. The algorithm has been used to develop software for identifying entries in DNA sequence databases which are contaminated with vector sequence. RESULTS: We present three algorithms, RAPID, PHAT and SPLAT, which together allow vector contaminations to be found and assessed extremely rapidly. RAPID is a word search algorithm which uses probabilities to modify the significance attached to different words; PHAT and SPLAT are alignment algorithms. An initial implementation has been shown to be approximately an order of magnitude faster than BLAST. The formal split between word searching and alignment not only offers considerable gains in performance, but also allows alignment generation to be viewed as a user interface problem, allowing the most useful output method to be selected without affecting the underlying similarity search. Receiver Operator Characteristic (ROC) analysis of an artificial test set allows the optimal score threshold for identifying vector contamination to be determined. ROC curves were also used to determine the optimum word size (nine) for finding vector contamination. An analysis of the entire expressed sequence tag (EST) subset of EMBL found a contamination rate of 0.27%. A more detailed analysis of the 50 000 ESTs in est10.dat (an EST subset of EMBL) finds an error rate of 0.86%, principally due to two large-scale projects. AVAILABILITY: A Web page for the software exists at http://bioinf.man.ac.uk/rapid, or it can be downloaded from ftp://ftp.bioinf.man.ac.uk/RAPID CONTACT: crispin@cs.man.ac.uk  相似文献   

5.
This study describes novel algorithms for searching for most parsimonious trees. These algorithms are implemented as a parsimony computer program, PARSIGAL, which performs well even with difficult data sets. For high level search, PARSIGAL uses an evolutionary optimization algorithm, which feeds good tree candidates to a branch-swapping local search procedure. This study also describes an extremely fast method of recomputing state sets for binary characters (additive or nonadditive characters with two states), based on packing 32 characters into a single memory word and recomputing the tree simultaneously for all 32 characters using fast bitwise logical operations. The operational principles of PARSIGAL are quite different from those previously published for other parsimony computer programs. Hence it is conceivable that PARSIGAL may be able to locate islands of trees that are different from those that are easily located with existing parsimony computer programs.  相似文献   

6.
7.
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles'' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface.  相似文献   

8.
The protein information resource (PIR)   总被引:13,自引:0,他引:13       下载免费PDF全文
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.  相似文献   

9.
10.
The organization of order picking operations is one of the most critical issues in warehouse management. In this paper, novel tabu search (TS) algorithms integrated with a novel clustering algorithm are proposed to solve the order batching and picker routing problems jointly for multiple-cross-aisle warehouse systems. A clustering algorithm that generates an initial solution for the TS algorithms is developed to provide fast and effective solutions to the order-batching problem. Unlike most common picker routing heuristics, we model the routing problem of pickers as a classical TSP and propose efficient Nearest Neighbor+Or-opt and Savings+2-Opt heuristics to meet the specific features for the problem. Various problem instances including the number of orders, weight of items, and picking coordinates are generated randomly, and detailed numerical experiments are carried out to evaluate the performances of the proposed methods. In conclusion, the TS algorithms come out to be the most efficient methods in terms of solution quality and computational efficiency.  相似文献   

11.
We present general algorithms for the compression of molecular dynamics trajectories. The standard ways to store MD trajectories as text or as raw binary floating point numbers result in very large files when efficient simulation programs are used on supercomputers. Our algorithms are based on the observation that differences in atomic coordinates/velocities, in either time or space, are generally smaller than the absolute values of the coordinates/velocities. Also, it is often possible to store values at a lower precision. We apply several compression schemes to compress the resulting differences further. The most efficient algorithms developed here use a block sorting algorithm in combination with Huffman coding. Depending on the frequency of storage of frames in the trajectory, either space, time, or combinations of space and time differences are usually the most efficient. We compare the efficiency of our algorithms with each other and with other algorithms present in the literature for various systems: liquid argon, water, a virus capsid solvated in 15 mM aqueous NaCl, and solid magnesium oxide. We perform tests to determine how much precision is necessary to obtain accurate structural and dynamic properties, as well as benchmark a parallelized implementation of the algorithms. We obtain compression ratios (compared to single precision floating point) of 1:3.3–1:35 depending on the frequency of storage of frames and the system studied.  相似文献   

12.

Background

Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial species. Existing tools to investigate genomic contexts, however, lack features for dynamically comparing and exploring genomic regions from multiple species. As DNA sequencing technologies improve and the number of whole sequenced microbial genomes increases, a user-friendly genome context comparison platform designed for use by a broad range of users promises to satisfy a growing need in the biological community.

Results

Here we present JContextExplorer: a tool that organizes genomic contexts into branching diagrams. We implement several alternative context-comparison and tree rendering algorithms, and allow for easy transitioning between different clustering algorithms. To facilitate genomic context analysis, our tool implements GUI features, such as text search filtering, point-and-click interrogation of individual contexts, and genomic visualization via a multi-genome browser. We demonstrate a use case of our tool by attempting to resolve annotation ambiguities between two highly homologous yet functionally distinct genes in a set of 22 alpha and gamma proteobacteria.

Conclusions

JContextExplorer should enable a broad range of users to analyze and explore genomic contexts. The program has been tested on Windows, Mac, and Linux operating systems, and is implemented both as an executable JAR file and java WebStart. Program executables, source code, and documentation is available at http://www.bme.ucdavis.edu/facciotti/resources_data/software/.  相似文献   

13.
Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.  相似文献   

14.
In shotgun proteomics, tandem mass spectra of peptides are typically identified through database search algorithms such as Sequest. We have developed DirecTag, an open-source algorithm to infer partial sequence tags directly from observed fragment ions. This algorithm is unique in its implementation of three separate scoring systems to evaluate each tag on the basis of peak intensity, m/ z fidelity, and complementarity. In data sets from several types of mass spectrometers, DirecTag reproducibly exceeded the accuracy and speed of InsPecT and GutenTag, two previously published algorithms for this purpose. The source code and binaries for DirecTag are available from http://fenchurch.mc.vanderbilt.edu.  相似文献   

15.
Often, the most informative genes have to be selected from different gene sets and several computer gene ranking algorithms have been developed to cope with the problem. To help researchers decide which algorithm to use, we developed the analysis of gene ranking algorithms (AGRA) system that offers a novel technique for comparing ranked lists of genes. The most important feature of AGRA is that no previous knowledge of gene ranking algorithms is needed for their comparison. Using the text mining system finding-associated concepts with text analysis. AGRA defines what we call biomedical concept space (BCS) for each gene list and offers a comparison of the gene lists in six different BCS categories. The uploaded gene lists can be compared using two different methods. In the first method, the overlap between each pair of two gene lists of BCSs is calculated. The second method offers a text field where a specific biomedical concept can be entered. AGRA searches for this concept in each gene lists' BCS, highlights the rank of the concept and offers a visual representation of concepts ranked above and below it. AVAILABILITY AND IMPLEMENTATION: Available at http://agra.fzv.uni-mb.si/, implemented in Java and running on the Glassfish server. CONTACT: simon.kocbek@uni-mb.si.  相似文献   

16.
Optimal spliced alignment of homologous cDNA to a genomic DNA template   总被引:17,自引:0,他引:17  
MOTIVATION: Supplementary cDNA or EST evidence is often decisive for discriminating between alternative gene predictions derived from computational sequence inspection by any of a number of requisite programs. Without additional experimental effort, this approach must rely on the occurrence of cognate ESTs for the gene under consideration in available, generally incomplete, EST collections for the given species. In some cases, particular exon assignments can be supported by sequence matching even if the cDNA or EST is produced from non-cognate genomic DNA, including different loci of a gene family or homologous loci from different species. However, marginally significant sequence matching alone can also be misleading. We sought to develop an algorithm that would simultaneously score for predicted intrinsic splice site strength and sequence matching between the genomic DNA template and a related cDNA or EST. In this case, weakly predicted splice sites may be chosen for the optimal scoring spliced alignment on the basis of surrounding sequence matching. Strongly predicted splice sites will enter the optimal spliced alignment even without strong sequence matching. RESULTS: We designed a novel algorithm that produces the optimal spliced alignment of a genomic DNA with a cDNA or EST based on scoring for both sequence matching and intrinsic splice site strength. By example, we demonstrate that this combined approach appears to improve gene prediction accuracy compared with current methods that rely only on either search by content and signal or on sequence similarity. AVAILABILITY: The algorithm is available as a C subroutine and is implemented in the SplicePredictor and GeneSeqer programs. The source code is available via anonymous ftp from ftp. zmdb.iastate.edu. Both programs are also implemented as a Web service at http://gremlin1.zool.iastate.edu/cgi-bin/s p.cgiand http://gremlin1.zool.iastate.edu/cgi-bin/g s.cgi, respectively. CONTACT: vbrendel@iastate.edu  相似文献   

17.
Since 1995, the WU-BLAST programs (http://blast.wustl.edu) have provided a fast, flexible and reliable method for similarity searching of biological sequence databases. The software is in use at many locales and web sites. The European Bioinformatics Institute's WU-Blast2 (http://www.ebi.ac.uk/blast2/) server has been providing free access to these search services since 1997 and today supports many features that both enhance the usability and expand on the scope of the software.  相似文献   

18.
A reduction-based exact algorithm for the contact map overlap problem.   总被引:1,自引:0,他引:1  
Aligning proteins based on their structural similarity is a fundamental problem in molecular biology with applications in many settings, including structure classification, database search, function prediction, and assessment of folding prediction methods. Structural alignment can be done via several methods, including contact map overlap (CMO) maximization that aligns proteins in a way that maximizes the number of common residue contacts. In this paper, we develop a reduction-based exact algorithm for the CMO problem. Our approach solves CMO directly rather than after transformation to other combinatorial optimization problems. We exploit the mathematical structure of the problem in order to develop a number of efficient lower bounding, upper bounding, and reduction schemes. Computational experiments demonstrate that our algorithm runs significantly faster than existing exact algorithms and solves some hard CMO instances that were not solved in the past. In addition, the algorithm produces protein clusters that are in excellent agreement with the SCOP classification. An implementation of our algorithm is accessible as an on-line server at http://eudoxus.scs.uiuc.edu/cmos/cmos.html.  相似文献   

19.

Background  

With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE? abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: bm25 and the ranking algorithm implemented in the open-source Lucene search engine.  相似文献   

20.
MOTIVATION: Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. RESULTS: We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. AVAILABILITY: The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号